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Preface 


This volume is about the foundation of mathematics, the way it was conceptualized 
by Russell and Whitehead [56], Hilbert (and Bernays) [22], and Bourbaki! [2]: 
Mathematical Logic. This is the discipline that, much later, Gries and Schneider [17] 
called the “glue” that holds mathematics together. 

Mathematical logic, on one hand, builds the tools for mathematical reasoning 
with a view of providing a formal methodology—i.e., one that relies on the form or 
syntax of mathematical statements rather than on their meaning—that is meant to be 
applied for constructing mathematical arguments that are correct, well documented, 
and therefore understandable. 

On the other hand, it studies the interplay between the written structure (syntax) 
of mathematical statements and their meaning: Are the theorems that we prove by 
pure syntactic manipulation true under some reasonable definition of true? Are there 
any true mathematical statements that our tools cannot prove? The former question 
will be answered in the affirmative later in this book, while the latter question, 
interestingly, has both “no” (Gédel’s completeness theorem [15]) and “yes” (Gédel’s 


'“Nicolas Bourbaki” is the pen-name of a team of top mathematicians who are responsible for the 
monumental work, “Elémens de Mathématique”, which starts with logic as the foundation, or “connecting 
glue” in the words of [17], and then proceeds to extensively cover fields such as set theory, algebra, 
topology, analysis, measure, and integration. 
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(first)incompleteness theorem [16]) answers!? Both of these answers are carefully 
reconstructed in the Appendix to Part II. 

Much has been written on logic, which is nowadays a mature mathematical body 
of knowledge and research. The majority of books written with upper-level under- 
graduate audiences (and beyond) in mind deal mostly with the metamathematics or 
metatheory of mathematical logic; that is, they view logic as a mathematical object 
and study its abilities and limitations (such as incompleteness), and the theory of 
models, giving short shrift to the issue of using logic as a tool. 

There are currently only two books that the author is aware of that chronologically 
precede this volume and address almost exclusively the interests and needs of the 
user of logic. Both present the subject as a set of tools with which one can do 
mathematics (or computer science, or philosophy, or anything else that requires 
reasoning) rigorously, correctly, formally, and with adequate documentation: (2] and 
[17]. 

The former tersely introduces logic in its first chapter with a view of applying 
it as a rigorous tool for theorem generation in the numerous (and very advanced) 
chapter-volumes that follow (from set theory and algebra to topology, and measure 
and integration). 

The latter, a much more recent entry in the literature, is an elementary text (aimed 
at undergraduate university curricula in computer science) in the same spirit as 
Bourbaki’s, which proposes to use logic, once again, as a tool to prove theorems 
of interest to computer scientists. Indeed, the second part of [17] is on discrete 
mathematics in the sense that this term, more or less, is understood by most computer 
science departments today. 

Similarly, the volume in your hands aims to thoroughly teach the use of logic 
as a tool for reasoning appropriate for upper-level undergraduate university students 
in fields of study such as computer science, mathematics, and philosophy. For 
the first group, this is an introduction to formal methods—a subject that is often 
included in computer science curricula—providing the student with the tools, the 
methodology, and a solid grounding on technique. As the student advances along 
the computer science curriculum, this volume’s toolbox on formal methods will find 
serious applications in courses such as design and analysis of algorithms, theory of 
computation, computational complexity, software specification and design, artificial 
intelligence, and program verification. 

The second group’s curriculum, at the targeted level, in addition to a solid course 
on the use of logic, will normally also require a more ambitious inquiry into the 


7It is not that Gédel was of two minds on the issue. Rather, the question can be made precise in two 
different ways, and, correspondingly, one gets two different answers. One way is to think of “universal” 
truth, such as the truth of “a = 2”. Universal truth is completely certifiable by the syntactic tools. The 
other is to think of truth in the “standard models” of some “rich” theories—rich in what one can formulate 
and prove in them, that is. Formal (Peano) arithmetic—that is, the axiomatic system that attempts to 
explain the set of natural numbers and the arithmetic operations and relations on it, the standard model— 
is such a rich theory. Gédel showed the existence of true arithmetical statements in the model that cannot 
be syntactically proved in the axiom system of Peano arithmetic. One such true statement says, “I am not 
atheorem.” 
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capabilities and limitations of logic viewed as a mathematical tool. This further trip 
into the metatheory of logic will traditionally want to delve into two foundational 
gems beyond those of soundness and propositional completeness. Both are due to 
Gédel, namely, his completeness and incompleteness theorems. The Appendix to 
Part II settles the former in full detail, and also offers a proof of the latter (actually, 
only of the first incompleteness theorem*), basing it on an inherent limitation of 
“general” models of computation, in the process illuminating the connection between 
the phenomena of uncomputability and unprovability. 

As a side-effect of constructing a self-contained proof of the first incompleteness 
theorem, we had to develop a fair amount of computability theory that will be of 
direct interest to all the readers, in particular, those in computer science. 

The third group of readers, philosophy majors, traditionally require less coverage 
in a course in logic than what I have presented here; however, philosophy curricula 
often include a course in symbolic logic at an advanced undergraduate level, and this 
volume will be an apt companion for such studies. 

The book’s aim to teach the practice of logic dictates that it must look and feel 
much like a serious text on programming. In fact, I argue at the very beginning of 
the first chapter, that learning and practicing logic is a process like that of learning 
and practicing programming. As a result, the emphasis is on presenting a multitude 
of tools, and on using these tools in many fully written and annotated proofs, an 
approach that is intended to enhance the reader’s effectiveness as a “prover”, giving 
him‘ many examples for emulation. 

There are some important differences—despite the superficial similarities that the 
common end-aims impose—between the approach and content in this volume and 
that in its similarly aimed predecessors [2] and [17]. 

Bourbaki provides tools for use by the “practicing mathematician” and does not 
bother with any semantic issues, presumably on the assumption that the mathemati- 
cian knows full well how the syntactic and semantic notions interact and relate, and 
has an already well developed experience and ability to use semantic methods toward 
finding counterexamples when needed. He merely introduces and uses the so-called 
Hilbert style of proofs (cf. 1.4.12) that is most commonly used by mathematicians. 

The text of [17] is equally silent about the interplay between syntax and semantics, 
and about any aspect of the metatheory, and refers to Hilbert-style proofs only tangen- 
tially. The authors prefer to exclusively propound the equational (or calculational) 
proof style (cf. Section 2.2), originally proposed in [11]. Moreover, unlike [2], they 
take liberties with their formalism.> For example, even though they argue in their 
introduction in favor of using formal metheds in practical reasoning, they distance 
themselves from a true syntactic approach, especially in their Chapter 8, where facts 


3The second incompleteness theorem, that the freedom of contradiction of “rich” axiomatic systems such 
as Peano arithmetic cannot be proved “from within”, is beyond the scope of this volume. Indeed, the only 
complete proofs in print for this result are found in [22], Vol. II. and in [53]. 

4His, him, he and related terms that grammatically indicate gender are, by definition, gender neutral in 
this volume. 

5A formalism in the context of mathematical logic is any particular way logicians structure their formal 
methods. 
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outside logic taken from algebra and number and set theory are presented as axioms 
of predicate logic. 

While the approach in this volume is truly formal, just like Bourbaki’s, it is 
not as terse; we are guilty of the opposite tendency! We also believe that, unlike 
the seasoned practitioner, the undergraduate mathematics, computer science, and 
philosophy students need some reassurance that the form-manipulation proof-writing 
tools presented here indeed prove (mathematical) “truths”, “all truths”, and “nothing 
but truths”. This means that we cannot run away from the most basic and fundamental 
metatheoretical results. After all, every practitioner needs to know a few things about 
the properties of his tools; this will make him more effective in their use. 

Thus I include proofs of the soundness (meta)theorems for both propositional and 
predicate logics (this addresses the “truths”, and “nothing but truths” part) and also 
the two “completeness” results, of propositional and predicate logics (this is the “all 
truths” part). However, to maintain both the emphasis on the use of logic and an 
elementary but rigorous flow of exposition I have delegated the much-harder-to-prove 
completeness metatheorem of predicate logic ({15]) to a sizable appendix at the end 
of the book. , 

Why are soundness and completeness relevant to the needs of the user? Complete- 
ness of propositional logic, along with its soundness, give us the much-needed—in 
the interest of user-friendliness—license to mix semantic and syntactic tools in formal 
proofs without sacrificing mathematical rigor. Indeed, this license (to use propo- 
sitional semantic tools) is extended even in predicate logic, and is made possible 
by the trick of adding and removing quantifiers (“for all” and “for some”). On the 
other hand, soundness of the two logics allows the user to disprove statements by 
constructing so-called countermodels. 

There are also quite a few simpler metatheoretical results, beyond soundness and 
completeness, that we routinely introduce and prove as needed about formulae (e.g., 
about their syntax) and about proofs (e.g., the validity of principles of proof such 
as hypothesis strengthening, deduction theorem, and generalization), using the basic 
tool of induction (essentially on formula and proof lengths). 

The Hilbert style of proving theorems is prevalent in the mathematical literature 
and is prominently displayed and practiced in this volume. On the other hand, the 
equational-style of displaying proofs has been gaining in popularity especially in 
computer science curricula. It is a style of proof that seems well adapted to areas in 
computer science such as software engineering (in particular, in the field of software 
engineering requirements) and program verification. 

For the above reason, equational-style proofs receive a-thorough exposition in this 
volume. It is my intention to endow the reader with enough machinery that will 
make him proficient in both styles of proof, but more importantly, will enable him to 
choose the style that is best suited to writing a proof for any particular theorem. 


In terms of prior knowledge (tools) needed to cope with this volume the reader 
should at least have high school mathematics (but I expect that this includes math- 
ematical induction and some basic algebra). A degree of mathematical maturity, 
but no specific additional knowledge, of the kind an upper-level undergraduate will 
normally have will also be handy. 
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A word on pedagogical approach. I repeatedly taught the material included here 
to undergraduate computer science students at York University in Toronto, Canada. 
I think of this book as the record of my lectures. I have endeavored to make these 
lectures user-friendly, and therefore accessible to readers who do not have the benefit 
of an instructor’s guidance. Devices to that end include anticipation of questions, 
promptings for the reader to rethink an issue that might be misunderstood if glossed 
over (“pauses”), numerous remarks and examples that reflect on a preceding definition 


or theorem. © © @ 

Using the symbols 1 , and , | am marking those passages that are very 
important, and those that can be skipped at first reading, respectively. 

My fondness for footnotes is surely evident (a taste acquired long ago, when I was 
studying Wilder’s excellent Introduction to the Foundations of Mathematics ([{57]). 

I give (mostly) very detailed proofs, as I know from experience that omitting 
details normally annoys students. Moreover, I have expectations that students will 
achieve a certain style, and effectiveness, in proofs. The best way to teach them to do 
so is by repeatedly giving examples how. In turn, students will have the opportunity 
to test and further their understanding by doing several exercises, some of which are 
embedded in the text while others appear at chapters’ end (a total of more than 190 
exercises). 


Book structure. The book is in two approximately equal-length parts, one on 
Boolean (or propositional) logic and one on predicate logic. A thorough exposition of 
Boolean logic pedagogically prepares the reader for the much more difficult predicate 
logic, at the same time endowing him with several tools that are transferable such as 
the ubiquitous Post’s theorem (propositional completeness) and deduction theorem. 

Part | is in three chapters. Chapter | starts with the basic formation rules of propo- 
sitional (Boolean) formulae—the syntax—and introduces “induction on formulae” as 
a tool via which we can prove facts about syntax. It proceeds with Boolean semantics 
(truth tables) and then continues with the concept of formal proofs—those effected 
via purely syntactic manipulation—from axioms and rules of inference. Chapter 2 
is a veritable database of proofs and theorems, presenting several proofs and proof 
techniques, including the deduction theorem. Both the equational and Hilbert style 
of proof layouts are used extensively. Chapter 3 revisits semantics, and proves 
both the soundness and completeness (Post) theorems, thus demonstrating the full 
equivalence and interchangeability of the semantic and syntactic proof techniques 
in Boolean logic. It concludes with an exposition of the technique of resolution in 
Boolean logic. 

Part II on predicate logic (or calculus) contains five chapters and a lengthy Ap- 
pendix. Predicate calculus is introduced as an extension of the logic of Part I, so that 
every tool that we obtained in Part I is still usable in Part II. This part’s first chapter, 
Chapter 4, is about the syntax of formulae, and introduces the axioms, the rules of 
inference, and the concept of proof, extending without discarding anything of the 
corresponding concepts of Part I. Chapter 5 simplifies the metatheoretical arguments 
by introducing a simpler-to-talk-about logic, equivalent to ours; that is, a logic with 
a simpler metatheory. Chapter 6 proves and extensively uses powerful rules of in- 
ference.that were not postulated up front: techniques for adding and removing the 
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universal quantifier, powerful Leibniz rules, and techniques for adding and removing 
the existential quantifier. Our version of predicate calculus, as is common in the 
literature nowadays, includes equality (=). Chapter 7 advances some basic properties 
of equality as these flow from the axioms and the rules of inference. 

Chapter 8 is a “working” first approximation to Tarski-like semantics and proves 
(in detailed outline) the soundness theorem for predicate calculus. This is an impor- 
tant tool toward constructing counterexamples, or countermodels as we prefer to call 
them, aiming to show that certain predicate logic formulae are not provable. 

The Appendix at the very end does several things: It revisits Tarski semantics 
that were naively presented in Chapter 8, proves soundness again, this time to- 
tally rigorously, and also proves Gédel’s completeness theorem. It then introduces 
computability, that is, the part of logic that makes the concepts of algorithm, compu- 
tation, and computable function mathematically precise. In this particular approach 
to computability, [ am using the programming language known in the literature as 
the Shepherdson-Sturgis ([44]) unbounded register machines (URMs). The topics 
included constitute the very foundation of the theory of computation and they will 
be of interest not only to: mathematics readers but also to those in philosophy and, 
especially, in computer science, who will find ample supplemental material for their 
theory of computation courses. These include partial computable functions, prim- 
itive recursive functions, a complete characterization in number-theoretic terms of 
the partial functions computable by URMs, the normal form theorems, the “Kleene 
predicate” and a “universal” URM, computable and semi-computable relations and 
their behavior in connection with Boolean operations and quantification, computably 
enumerable relations, unsolvability, verifiers and deciders, first-order definability, 
and the arithmetical relations. This machinery will next allow us to tackle Gédel’s 
first incompleteness theorem. This we prove by basing the proof on the nonexistence 
of a URM program that solves the following problem (halting problem) for any choice 
of x and y: “Will program z ever terminate if its input is y?” 


Suggested coverage. A computer science curriculum in formal logic will probably 
cover everything but the Appendix. The course MATH1090 at York University, 
especially designed for computer science majors, does exactly that. However, ahybrid 
course in logic and computability, often included in computer science curricula, will 
adjust its pace (e.g., going faster through Part I) to include the computability and 
Gédel incompleteness topics of the Appendix. A mathematics major will typically 
see his first course in logic in an upper-undergraduate year. His syllabus will likely 
require that the book be studied from cover to cover (again, going fast through Part I). 
A philosophy major’s needs in a course in logic are harder to fit to a prescribed 
template. Advanced students will likely find all of Part I relevant along with chapters 
4-6 of Part II. They will also find a high degree of relevance in the computability and 
Gédel incompleteness topics of the Appendix. 


GEORGE TOURLAKIS 


Toronto 
June 2008 
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CHAPTER 1 


THE BEGINNING 


Mathematical logic, or as we will simply say, “logic”, is the science of mathematical 
reasoning. Its core consists of the study of the form, meaning, use, and limitations of 
logical deductions, the so-called proofs. 

This volume, which is aimed at upper-level undergraduate university students 
who follow a course of study in computer science, mathematics, or philosophy, will 
emphasize mainly the use of proofs—it is written with the interests of the user in 
mind. 


1.0.1 Remark. (Before we Begin) The symbol “ ’ goes at least as far back as 
the writings of Bourbaki. It has been made widely accessible to authors—who like to 
typeset their writings themselves—through the typesetting system of Donald Knuth 
(known as “TjEX”). @ 
I use these “road signs” as follows: A passage enclosed between two single “ AC ” 
symbols is purported to be very noteworthy, so please heed! ® ® 
ee ”) 


On the other hand, a passage enclosed between two double signs ( 
means two things. 


This symbol is a stylized typographical version of the “(dangerous) winding-road” road sign. 
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The bad news is that it is rather difficult, or esoteric, or both. The good news is 
that you do not need to understand (or even read) its contents in order to understand 
all that follows. It is only there in the interest of the “demanding” reader. Such 
“doubly dangerous” passages allow me to digress without injuring continuity—you 
can ignore these digressions! O 


Learning to use logic, which is what this book is about, is like learning to use a 
programming language. 

In the latter case, probably familiar to you from introductory programming courses, 
one learns the correct syntax of programs, and also learns what the various syntactic 
constructs do—that is, their semantics. After that, one embarks—for the balance of 
the programming course—on a set of increasingly challenging programming exer- 
cises, so that the student becomes proficient in programming in said language. 

We will do an exactly analogous thing in this volume: We will learn to write 
proofs, which are nothing else but annotated sequences of formulae and are similar 
to computer programs in terms of syntactic structure—the annotations playing a role 
closely similar to that of comments in computer programs. 

But to do that, we need to know, to begin with, what are the rules of correctly 
writing down a formula and a proof! We have to start with the syntax of these 
objects—formulae and proofs—precisely as it is done in the case of programming 
and its related objects, the programs. 

Thus, we will begin with learning the syntax of the logical language, that is, what 
syntactically correct formulae and proofs look like. We will also learn what various 
syntactic constructs “say” (semantics). For example, we will learn that a formula 
makes a “statement”. A proof also makes a statement, that every formula in it is true 
in some very intuitively acceptable sense. 

We will learn that correctly written proofs are finite and “checkable” means toward 
discovering mathematical “truths”. We will also learn via a lot of practice how to 
write a large variety of proofs that certify all sorts of useful truths of mathematics. 

The above task, writing proofs—or “programming in logic” if you will—is our 
main aim. This will equip you with a toolbox that you can use to discover or certify 
truths. It will be handy in your studies in computer science, and in whatever area of 
study or research you embark upon and where reasoning is required. 

However, we will also look at this toolbox, the logic, as an object of study and 
study some of its properties. After all, if you want to take up, say, carpentry, then you 
need to know about tools such as hammers—their properties (e.g., hard and heavy) 
and limitations (e.g., unfriendly to fingers). 


When using the toolbox to prove theorems, you work within logic. On the other 
hand, when studying the toolbox, you work in logic’s metatheory (in metalogic) to 
talk and reason about logic. 

People often do this kind of study with programming languages, looking at them 
as objects of study rather than as instruments to write programs with. For example, 
in an advanced course on the comparative study of programming languages one 
looks at several programming languages and compares them for features, suitability 
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for certain programming tasks—for any specific task some are more suitable than 
others—limitations, etc. 

Here is another analogy: In the “real world” that we live in, one builds flight 
simulators, which we use to simulate flying an airplane, and in the process we 
learn how to do so. The real world where the simulator is built is the simulator’s 
metatheory, where we can, among other things, study the properties and limitations of 
simulators and compare several simulators for features such as relative “power” (i.e., 
how effective or realistic they are), etc. Similarly, formal logic is built within “real 
mathematics”, as we will see in the next section. It, too, is a “simulator” employed 
to write formal proofs that certify the truth of mathematical statements. These proofs 
imitate the kind of informal proofs one typically employs in informal mathematics 
but do so within a precisely specified system of notation (called language), rules, and 
assumptions. Thus, using formal logic is a means to learn how to write proofs—and 
not only formal proofs!—just as using a flight simulator is a means of learning how 
to fly a real plane. The metatheory of logic—the “real mathematics” —addresses 
questions among the deepest of which is the question of how far formal logic can go 
in discovering mathematical truths. 


Let us next look more closely at the similarity between programming languages 
and programming on one hand and logical languages and proving on the other, and 
argue that, similar as the two activities may be, the second one is a bit easier! 


(1) In programming, you use the syntactic rules to write a program that solves a 
problem. 


(2) In logic, you use the syntactic rules to write a proof that establishes a theorem. 


In the latter task you are done as soon as the proof ends. At the end of the proof 
you have your theorem, exactly as stated. 

In the former task, programming, it is not enough to just write a program! You 
next have to convince your boss, or your instructor, that the program indeed solves the 
problem; that it is “semantically correct” with respect to the problem’s specification. 

Note that in proving a theorem you have a purely syntactic task. Once your 
correctly written proof ends with the theorem you were trying to prove, you are done. 
There is no messing about with semantics. 

There is another reason why programming is harder than proving theorems: Pro- 
gramming has to be painstakingly precise because it involves your writing instruc- 
tions for a dumb machine to “understand” and follow. You must be absolutely and 
pedantically clear in your instructions. 

On the other hand, you address a proof to a human who knows as much as you do, 
or more, about the subject. This human will in general accommodate a few shortcuts 
that you may want to take in your presentation. 

In short, proofs are read by “intelligent” humans, while programs are read by 
“dumb” computers. We need to work really hard to speak at the level of the latter. 

Will you ever need to deal with semantics in logic? Yes! Semantics is useful 
when you want to disprove (or refute) something, that is, to prove that it is a false 
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statement, a fallacy. We will talk about semantics later—three times: once under 
Boolean logic, once under predicate logic, and one last time in the Appendix. 

There are many methodologies or paradigms (and corresponding programming 
languages suitable for the task) for writing programs. For example (add the word 
programming after each italicized keyword), procedural (Algol, Pascal, Turing), 
functional (LISP), logic (Prolog), and object-oriented (C++, Java). Most computer 
science departments will expose their students to many of the above. 

Similarly there are several methodologies for writing proofs. For example (add 
the word style after each italicized keyword), equational (the one favored by [17]), 
Hilbert (favored by the majority of the mathematics, computer science, and logic 
literature), Gentzen’s natural deduction, etc. 

My aim is to assist the reader to become an able user of the first two styles: the 
equational and the Hilbert style of proof. 

In both methodologies, an important required component is the systematic anno- 
tation of the proof steps. Such annotation explains why we do what we do, and has 
a function similar to that of comments in a program. 

Okay; one can grant that a computer science student needs to learn programming. 
But logic? You see, the proper understanding of propositional logic is fundamental 
to the most basic levels of computer programming, while the ability to correctly use 
variables, scope, and quantifiers is crucial in the use of loops, and subroutines, and 
in software design. Logic is used in many diverse areas of computer science, includ- 
ing digital design, program verification, databases, artificial intelligence, algorithm 
analysis, computability, complexity, and software specification. Besides, any science 
that requires you to reason correctly to reach conclusions uses logic. 

When one is learning a programming language, one often starts by learning a 
small subset of the language, just to smooth the learning curve. Analogously, we 
will first learn—and practice—a subset of the logical language. This we will do not 
due to some theoretical necessity, but due to pedagogical prudence. This particular, 
“easy” subset of (the “full”) logic that we will embark upon learning goes by many 
names: Boolean logic, propositional logic, sentential logic, sentential calculus, and 
propositional calculus. 

The “full logic” we will call by any of the names predicate calculus, predicate 
logic, or first-order logic. 

I like the calculus qualifier. It connotes that there is a precise way to “calculate” 
within logic. It emphasizes that building proofs is an algorithmic and precise process, 
just like programming. 


Indeed, it turns out that you can write a program, say, in Pascal, that will accept no 
input, but if it is allowed to run forever it will print all the theorems of logic’ (and not 
just those of the Boolean variety)—and never print a non-theorem!—in some order, 
possibly with some repetitions (cf. A.4.7 on p. 270). 


7We will soon appreciate that there are infinitely many theorems in logic. 


ee 
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Equivalently,® we can write a program that is a theorem verifier. That is, given as 
input a theorem, the program will verify that it is so, in a finite number of steps. If 
the input is a non-theorem, our verifier will decline an answer—it will run forever. 

Thus, proving theorems is a mechanical process! 


Digression: The above assertion is an example of a true assertion about the logic, not 
one that we can prove using exclusively the tools of logic as a tool. It is ametatheorem 
of logic as we say, not a theorem. 

The proof of this metatheorem requires techniques much more powerful than— 
indeed external to—those that the logic provides. We wil! prove this metatheorem in 
the Appendix to Part II (A.4.6). 

So metatheorems are truths about the logic that we prove with tools external to 
the logic, while theorems are truths that the logic itself is capable of proving. 

There is some danger that the above statement, “proving theorems is a mechanical 
process”, may be misinterpreted by some as one advocating that we build proofs by 
mindlessly shuffling symbols. Nothing is further from reality. 

The statement must be understood precisely as written. It says that there is a 
“mindless” way, a programmable way, to generate and print all possible theorems 
of logic, and, equivalently, also a programmable way to verify all theorems, which, 
however, refuses to verify any non-theorem by “looping” forever when presented 
with any such as input. 

But it is not a recipe for how we ought to behave when we write proofs. This is 
not the way a mathematician, or you or I, go about proving things—mindlessly. In 
fact, if we do not understand what is going on, we cannot go too far. 

Moreover, interesting, even important, as this result (about the existence of theorem 
verifiers) may be theoretically, it is useless practically, as we further discuss below. 

Our task is different. In general, we are more inquisitive. Given an arbitrary 
(mathematical) statement, we do not know ahead of time if it is a theorem or not. 
This italicized statement, the so-called decision problem of logic, is what we normally 
are interested in. Thus, our “verifier” is not very helpful, for if the statement that we 
present it as input is nor a theorem, then the verifier will run forever, not giving an 
answer. 

Hmm. Can we not write a decider for logic? The answer to this is interesting, but 
also reassuring to mathematicians (and all theorists): Their jobs are secure! 


(1) For Boolean logic, we can, since the question “Is this statement a theorem?” 
translates to “Is this statement a tautology?” (cf. 3.2.1). The latter can be settled 
algorithmically via truth tables. But there is a catch: Checking a formula 
(the formal counterpart of a “statement’’) for tautology status is an unfeasible 
problem.? So we can do it in principle, but this fact is devoid of any practical 
value. 


8 That this formulation of the claim is equivalent to the preceding one is a standard result of computability. 
Cf. Appendix to Part Il, Remark A.3.91 on p. 262. 

The term unfeasible—also intractable—has a technical connotation in complexity theory: It means a 
problem for which we know of no algorithm that runs in polynomial time as a function of the input 
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(2) For predicate logic, the answer is more pleasing to mathematicians. 


First, there exists no decider for this logic if we expand it minimally so that it can 
reason about the theory of natural numbers (this is Alonzo Church’s theorem, 
[3, 4]). 


Second, even if one were to be satisfied simply with a verifier for theorems, then 
we still would have no general solution of any practical value in hand. Indeed, 
again considering the logic augmented so that it can “do number theory”, any 
chosen verifier V for this logic would be extremely slow in providing answers 
in the following precise sense: For any choice of a step-counting function f(n), 
there is an infinite subset, S, of the set of theorems of number theory, such that 
each theorem-member, 7, of S that is composed of n symbols requires for its 
verification more than f(n) steps to be performed by V.'° This is a result of 
Hartmanis ([19]). 


Let us stop digressing for now. In the next section we begin the study of the 
sublogic known as propositional calculus. 


1.1. BOOLEAN FORMULAE 


We will continue stressing the algorithmic nature of the discipline of proving, just as 
it is the case in the discipline of programming. 

In particular, just as in serious programming courses the programming language 
is introduced via precise formation rules that allow us to write syntactically correct 
programs, we will be every bit as serious by introducing very precisely the rules for 
writing syntactically correct (1) formulae and (2) proofs. 

Once again, the syntax of the logical language is much simpler to describe than 
that of any commercially available programming language. 

So, how does one build—i.e., what are the rules for writing down correctly— 
formulae? 

Continuing with the programming analogy, you will recall that to define a pro- 
gramming language, i.e., the syntax of its programs, one starts with the list of 
admissible symbols, the so-called alphabet. In some languages, the alphabet in- 
cludes symbols such as “3, 4,0, [, A, B,c,d, £,+,x,—” and “keywords”—that is, 
multiple-character symbols—such as if, then, else, do, begin. 

Similarly, in Boolean logic, we start with the basic building blocks, which collec- 
tively form what is called the alphabet (for formulae). Namely, 


length—or worse, we know that such an algorithm does not exist. In this case it is the former. However, 
there is a connection with the so-called “P vs. NP” open question (see {5]). If a polynomial algorithm 
that recognizes tautologies does exist, then the open problem is settled as “P = NP”, something that the 
experts in the field consider highly unlikely. The truth table method runs in exponential time. 

‘SFor example, consider f(n) = 22°" if we think of f(n), for each n, as representing picoseconds of 
run time of the verifier V (1 picosecond is 10~ !? seconds), then every member of S of length more than 
4 symbols will require the verifier V to run for more than 5.70045 x 10788 years! 


ee 
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Al. Symbols for variables, called the Boolean or propositional or sentential vari- 
ables. These are p,q,7, with or without primes or subscripts (i.e., p’, 913, Te} 
are also symbols for variables). 


© We often need to write down expressions such as “A[p := B]”, to be defined 
later (1.3.15), but do not wish to restrict them to the specific variable p. Nor can 
we say things such as “for any Boolean variable p we consider A[p := B]...” 
as there is only one specific p! 


We get around this difficulty by employing so-called metavariables or syntactic 
variables—i.e., symbols outside the alphabet that we can use to refer to or point, 
generally, to any variable. We adopt the names for those to be the boldface 
p,q,r with or without primes or subscripts. Thus pg; names any variable 
P, 4,7", Ager, etc. Rarely if ever in this volume will we need to use more 
Boolean metavariables than these two: p, q. 


We can now use the expression “for every Boolean variable p we consider 
A[p := B]...” referring to what p names rather than to p itself. Two 
analogous examples are, from algebra, “for every natural number n” (n is not 
a natural number!) and, from programming, where we might say about Algol, 
“for each variable x, the instruction x := x + 1 means to increase the value of 
x by one.” Again, x is not a variable of Algol; X13, Y X 799, though, are. But 
it would be meaningless to offer the general statement “for each variable X 13, 
the instruction X13 := X13 + 1, etc.” since X13 is a specific. variable of the 
Algo! syntax. The programming language metavariable x allows us to speak of 
all of Algol’s variables collectively! 


On the other hand, the expression “for every Boolean metavariable” refers to 
the set of metavariables themselves, {p, q,rgg,..-} and will be rarely, if ever, 
used. The expression “for every Boolean metavariable p” is as nonsensical as 
“for every Boolean variable p”. 


A2. Two symbols for Boolean constants, namely T and 1. These are pronounced 
variously in the literature: verum (also top, or symbol “true”) and falsum (also 
bottom, or symbol “false”"'), 


A3. Brackets, namely, ( and ). 
A4. “Boolean connectives”, namely, the symbols listed below, separated by commas 


Ny A, V, = = (i) 
Let us denote by V the alphabet consisting of the symbols described in A1—A4. 


‘Usually. the qualifier symbol is dropped and then the context is called upon to distinguish between 
“true/false” the symbols vs. “true/false” the Boolean values of the metatheory (introduced in Section 1.3). 
In particular. cf. Definition 1.3.2 and Remark 1.3.3. 
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1.1.1 Remark. (1) Even though I say very emphatically that p,q,7, etc., and also 
T and 1, are just symbols,'*—the former standing for variables, the latter for 
constants—yet, I will stop using the qualification symbols, and just say variables 
and constants. This entails an agreement: I always mean to say symbols, | just don’t 
Say it. 

(2) Most variable symbols are formed through the use of “subsymbols”—such as 
0, 1,2, ’—that are not members of the alphabet V themselves; e.g., p//19934. This 
does not detract from the fact that each variable (name) is a single symbol of Y, 
entirely analogously with, say, the keywords of Algol if, then, begin, for, etc. 

(3) Readers who have done some elementary course in logic, or in the context of a 
programming course, may have learned that —, V are the only connectives one really 
needs since the rest can be expressed in terms of these two. Thus we have deliberately 
introduced redundancy in the adopted set of connectives (i) above. This choice in 
the end will prove to be user-friendly and will serve our aim to give a prominent role 
to the connective =, in the axioms and in rules of inference (Section 1.4). O 


1.1.2 Definition. (Strings or Expressions; Substrings) We call a string (also word 
or expression), over a given alphabet, any ordered sequence of the alphabet’s symbols, 
written adjacent to each other without any visible separators (such as spaces, commas, 
or the like). 

For example, aabba is a string of symbols over the alphabet {a, b,c,0, 1,2, 3} 
(note that you don’t have to use all the alphabet symbols in any given string, and, 
moreover, repetitions are allowed). Ordered means that the position of symbols in 
the string matters; e.g., aab # aba. 

We denote arbitrary strings over the alphabet A1-A4 by string variables, i.e., 
names that stand for arbitrary'> or specific'* strings. Specific strings, or string 
constants, are sometimes enclosed in double quotes to avoid ambiguity. For example, 
if we say 


Let A be the string aab. 


we need to know whether the period is part of the string or not. If it is not we 
symbolically indicate so by writing 


Let A be the string “aab”. 
If it were part of the string, then we would have written instead 
Let A be the string “aab.”. 


String variables—by agreement—will be denoted by uppercase letters A, B, C, 
D, E, P,Q, R, S, W etc., with or without primes or subscripts. In particular, since 
Boolean expressions (and theorems) are strings, this naming is valid for this special 
case, too. 


'2Some logicians put it more emphatically: “meaningless symbols”. 
3E.g., “let A be any string”. 
'4E 9. “let A stand for (=(p A q))”. 
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The major operation on strings is concatenation, that is, juxtaposition. To con- 
catenate the strings (named) A and B, in that order, is to form the string (named) 
AB that consists of the symbols in A, from left to right, immediately followed by 
the symbols in B, from left to right. Thus, if A is aab and B is 00110, then AB is 
aab00110. 

Clearly, concatenation is an associative operation, i.e.,(AB)C = A(BC). Hence, 
when we omit brackets, as we normally do, and simply write ABC, there is no 
ambiguity since wherever we may insert the brackets that we “forgot” makes no 
difference! 

There is a very special string that we call the empty string and denote by « (this 
being a specific string, a constant, we deviate from the naming convention A, B,C,... 
above). What is special about it is that it contains no symbols, so that Ae = €A = A. 

“Bis a substring of A” means that for some strings C' and D we have A = CBD. 
For example, over the alphabet {a, b} we have that a is a substring of aab. Indeed 
there are two occurrences of ain aab as substrings: A first (shown boxed) is justified 
by noting aab = ¢a jab and a second is justified by noting aab = ala p. -O 


We can build all sorts of expressions over our Boolean alphabet V, such as 
pp, pqr, ~/\, (r — r), and a lot of others. 

Some such strings (e.g., the last one above) are well-formed-formulae (in short, 
wff), the rest being gibberish. 


Hmm. How can we tell? For example, if we asked an unsuspecting (not logically 
trained) passerby which of the following are “well-formed” 

p=p 

Pp ~ 

((pV q) = 4) = ((pAq) =p) 
we would have no right to expect any better than lucky guesses from him (we can 
check, by asking him “why?” in each case). 

So, how can we tell? The obvious (silly) answer would be, “Why not tabulate all 
formulae? Then we can check any string for formula status by table look-up. If the 
string is in the table, then it is a formula; otherwise it is not a wff.” 

Of course this is silly, for we cannot write down an infinitely long table such as a 
table of all formulae would be. 


We must find a way to define a set of infinitely many strings (the formulae) by a finite 
text. 


Pause. Can we do such a thing? 


Absolutely. We will give a precise process that every time it is applied builds a 
formula, and will never build a nonformula. Moreover, it is “general enough” so that 
if it is applied over and over, for ever, it will build all formulae. 

We are ready to define formula-calculation. 


1.1.3 Definition. (Formula-Calculation or Formula-Parse) We will call formula- 
calculation (or formula-parse) any finite (ordered) sequence of strings that we may 
write respecting the following three requirements: 
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(1) At any step we may write any symbol from A1 or A2 of the alphabet (p. 9). 


(2) At any step we may write the string (=A), provided we have already written the 
string A. 


(3) At any step we may write any of the strings (A A B), (AV B), (A — B), 
(A = B), provided we have already written the strings A and B. | 


1.1.4 Example. In the first step of any formula-calculation, only requirement (1) of 
Definition 1.1.3 is applicable, since the other two require the existence of prior steps. 
Thus, in the first step, we may write only a variable or a constant. In all other steps, 
all the requirements (1)-(3) are applicable. 

Here is a calculation (the comma is not part of the calculation, it just separates 
strings written in various steps): 


pT, (=T),@ 


Verify that the above obeys Definition 1.1.3. 
Here is a more interesting one: 


P,4, (PV 4), (pq), (PV 4) =4); 
((pAq) =p), (((pV 4) = 9) = (pA q) =D)) a 


1.1.5 Definition. (Boolean Expressions or wff or Formulae) A string A over the 
alphabet A1—A4 will be called a Boolean expression or a well-formed-formula iff'> 
it is a string written at some step of some formula-calculation. 

The set of Boolean expressions we will denote by WFF. A member of WFF is 
often called a wff (a formula). O 


1.1.6 Remark. (1) The idea of presenting the definition of formulae as a “construc- 
tion” or “calculation” goes at least as far back as [2, 21). 

(2) We used, in the interest of user-friendliness, active or procedural language in 
Definition 1.1.3!° (i.e., that we may do this or that in each step). A mathematically 
more austere (hence “colder”(!)) approach that does not call upon anyone to write 
anything down—and does not speak of ‘‘steps”—would say exacily the same thing 
as Definition 1.1.3 rephrased as follows: 


A formula-calculation (or formula-parse) is any finite (ordered) sequence of 
strings, A;, Ao,..., A, such that—for all i = 1,...,n— A; is one of: 


(I) Any symbol from Al or A2 


(II) (=A), provided A is the same string as some A;, where 1 <j <i 


'SIf and only if, 
'6Exactly as [21] does. 
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(III) Any of the strings (A A B), (AV B), (A > B), (A = B), provided A is the 
same string as some A;, where 1 < j < i and B is the same string as some 
Ax, where 1 < k < i (itis allowed to have j = k, if needed) 


(3) There is an advantage in the procedural formulation 1.1.3. It makes it clear 
that we build formulae in stages (or steps), each stage being a calculation step. 


In each step where we apply requirement (2) or (3) of 1.1.3, we are building a 
more complex formula from simpler formulae by adding a Boolean connective. 


Moreover, we are building a formula from previously built formulae. 


These last two remarks are at the heart of the fact that we can prove properties 
of formulae by induction on the number of steps (stages) it took to build it, or more 
simply, by induction on its “complexity” (that is, the total numbers of connectives in 
the formula, counting repetitions; see next section). O 


The concluding remark above motivates an “inductive” or “recursive” definition 
of formulae, which is the favorite definition in the “modem” literature, and we should 
become familiar with it: 


1.1.7 Definition. (Alternative (Recursive) Definition of WFF) The set of all well- 
formed-formulae is the smallest set of strings, WFF, that satisfies 


(1) All Boolean variables are in WEF, and so are the symbols T and L. We call 
such formulae atomic. 


(2) If A and B are any strings in WFF, then so are the strings (=A), (A A B), 
(AV B),(A— B), (A= B). a 


1.1.8 Remark. (a) Why “recursive”? Because item (2) in 1.1.7 defines the concept 
formula in terms of (a smaller, or earlier, instance of) “itself”: It says, in essence, 
“...(AV B)" is a formula, provided we know that A and B are formulae. ...” 


In programming terms, confronted with, say, the 3rd subcase of case (2) of the 
definition, we “call” the definition recursively, twice, to settle the questions “Is 
Aa wff?” and “Is B a wff?” If “yes” for both, then we proclaim that (A V B) is 
a wff. 


(b) Part (1)in 1.1.7 defines the most basic, most trivial formulae. This part constitutes 
what we call the Basis of the inductive (recursive) definition, while part (2) is 
called the inductive, or recursive, part of the definition. 


(c) 1.1.7 and 1.1.5 say the same thing, looking at it from opposite ends: Indeed, 
suppose that we want to establish that a given string D is a formula. If we are 
using 1.1.5, we will try to build D via a formula-calculation, starting from atomic 


'7Rach of A and B are substrings (cf. 1.1.2) of (A V B), so they are “smaller” than the latter. They are 
also “earlier” in the sense that we must already have them—i.e., know that they are formulae—in order to 
proclaim (A V B) a formula. 
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ingredients and building as we go successively more and more complex formulae 
until finally, in the last step, we obtain D. 


If on the other hand we are using 1.1.7, we are working backwards (and build a 
formula-calculation in reverse!). Namely, if D is not atomic, we try to guess— 
from its form—what was the last connective applied. Say we think that it was 
—+, that is to say, D is (A — B) for some strings A and B. Now we have to 
verify our guess! This requires that the strings A and B are formulae. Thus, 
taking each of A and B in turn as (a new, smaller) “D” we repeat this process. 
And so on. This is a terminating process since the new strings we obtain (for 
testing) are always smaller than the originals. 


Of course, I did not prove here that the two definitions define the same set WFF.'8 
But they do! 


Technically, the term smallest is crucial in 1.1.7 and it corresponds to the similarly 
emphasized iff of 1.1.5. A proof that the two definitions are equivalent is beyond 
our syllabus. O 


1.1.9 Example. Let us verify using 1.1.7 that ((p V q) Vr) is a formula. 


call #1 We guess that the rightmost “V” is the last to apply; thus, using (2) in 1.1.7 
we must now verify that (p V q) and r are formulae. Well, r is by (1) in 
WFEF. However, (p V q) leads to call #2. 


call #2 Again, using (2)—this time we do not need any guessing; there is only one 
connective—we must verify that p and g are formulae. This is so by (1) in 
1.1.7. 


When we use Definition 1.1.7 to verify that a string is a formula, we say that we 
parse the string top-down. On the other hand, when we build the formula using 
Definition 1.1.5, then we are parsing it bottom-up. 

Can we parse the above string in another top-down way? Obviously, our recursive 
call to the definition hinges around one of the two “Vv” symbols in the string. We 
must guess which is “the right one” (as the last connective to apply). Why is the 
leftmost connective not “right”? 

Here is where metamathematical analysis comes in: The leftmost connective will 
work as the “last one to apply” iff (according to 1.1.7) “(p” and “g) V r” are both 
formulae. The metamathematical analysis of formula syntax!® (see next section) 


'81 only hand-waved to that effect, arguing that for any string D in WEF. 1.1.5 builds a calculation the 
normal way, while 1.1.7 builds it backwards. I conveniently swept under the rug the case where D is not 
in WEF, i.e., is not correctly formed. 
'9T know that the separation of “mathematics” from “metamathematics” is at first tricky. Think of the 
hammer analogy: You do “theory” (or “mathematics” or “logic”) when you use the hammer. On the other 
hand, when articulating a principle such as “It is inevitable that I will hit my finger with the hammer’, then 
you are doing an analysis of the hammer’s behavior. You are doing “metatheory” (or ‘“‘metamathematics” 
or “metalogic’’). 

Similarly. you do logic when you generate a formula according to 1.1.5, or backwards according to 
1.1.7. However, articulating principles such as “There is only one way to parse a formula”, or “Every 
formula has balanced brackets”. studies, does not build, formulae and thus lies within the metalogic. 
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tells us that every formula must have a balance of left and right brackets. So none of 
these two is a formula, and therefore the leftmost V cannot be the last connective to 


apply. Oo 


In the course of a formula-calculation (1.1.3), we write some formulae down 
without looking back (step of type (1)). Some others we write down by combining 
via one of the connectives A, V, +, = two formulae A and B already written, or by 
prefixing one already written formula, C, by -. 

In terms of the construction by stages, the formula built in this last stage had as 
immediate predecessors A and B in the first case, or just C in the second case. 

One can put this elegantly via the following definition: 


1.1.10 Definition. (Immediate Predecessors) None among the constants T and 1, 
or among the variables, have any immediate predecessors. 

Any of the formulae (A A B),(AV B),(A — B),(A = B) have A and B as 
immediate predecessors. A is an immediate predecessor of (-A). 

Sometimes we use the acronym i.p. for immediate predecessor. oO 


It turns out that a formula uniquely determines its i.p. We give a proof later (1.2.5). 


1.1.11 Remark. (Priorities) In practice, too many brackets make it hard to read 
complicated formulae. Thus, texts (and other writings in logic) often come up with 
an agreement on how to be sloppy, but get away with it. 

This agreement tells us what brackets are redundant—and hence can be removed— 
from.a formula written according to Definitions 1.1.5 and 1.1.7, still allowing the 
formula to “say” the same thing as before: 


(1) Outermost brackets are redundant. 


For the remaining two cases, it is easiest to think of the process in reverse: How 
to reinsert correctly (as per Definition 1.1.5} any omitted brackets. 


(2) Any other pair of brackets is redundant, if its presence (as dictated by 1.1.5) canbe 
understood from the priority, or precedence, of the connectives. Higher-priority 
connectives bind before lower-priority ones. That is, if we have a situation where 
a subformula”’ A of a formula has already been reconstructed as per 1.1.5, and 
is claimed by two distinct connectives o and ©, among those in (*) below, as 
in “...o Ao,..”, then the higher-priority connective “glues” first. This means 
that the implied brackets are (reinserted as) “...0 A)o...” or “...0(Ao...” 
according as o or ¢ has the higher priority, respectively. 


The order of priorities (decreasing from left to right) is agreed to be:?! 


a,A,V, 7,= (*) 


20 subformula of a formula B is a substring of B that is a formula. 

21Other agreements for priorities are possible. 1 offered the one that most people use. But remember: It 
is only an agreement, which means (1) we must stick to it, and (2) it is neither “more right” nor “more 
wrong” than any alternative agreement. 


ee 
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(3) Ina situation like“... Ao...”—where A has already been reconstructed as in 
1.1.5, and © is any connective listed in (+) above, other than ——the right © acts 
before the left. Thus the implied bracketing is “...0(Ao...”. 


Similarly, in ——A is short for (=(—A)). 


We say that all connectives are right associative. 
It is important to emphasize: 


(a) This “agreement” results in a shorthand notation. Most of the strings depicted by 
this notation are not correctly written formulae, but this is fine: Our agreement 
allows us to decipher the shorthand and uniquely recover the correctly written 
formula we had in mind. 


(b) I gave above the convention that is followed by 99.9% of writings in logic, and 
in (almost) all programming language definitions (when it comes to “Boolean 
expressions” or “conditions”). 


(c) The agreement on removing brackets is a syntactic agreement. 


In particular, right associativity says simply that, e.g., p V q V r is shorthand for 
(pV (q¢V r)) rather than ((p V q) Vr). 


However, no claim is either made or implied that (p V (q V r)) and ((p V q) Vr) 
mean different things. At this point meaning (Boolean values) has not yet been 
introduced. When it is later on, we will easily see that (pV(qVr)) and ((pVq) Vr) 
mean the same thing. QO 


1.1.12 Example. p stands for p. 

—p stands for (=p). 

p—q-r stands for (p > (q > r)). 

If I want to simplify ((p — q) — r), then (p — q) — r is as simple as I can get 
it to be. 

ap A qVr is short for (((=p) Aq) Vr). 

If in the previous I wanted to have — act last, and V to act first, then the minimal 
set of brackets necessary is: =(p A (q V r)). | 


A connection with things to come in a degree program in computer science. 
Any set of mules that tell us how to correctly write down strings constitutes a so- 
called grammar. Formal language theory studies grammars, the sets of strings that 
grammars define (called formal languages), and the procedures (or “machines”) that 
are appropriate to parse these strings. 

In an introductory course on “automata and formal language theory”, a student 
learns about formal languages. Such a student would quickly realize that Defini- 
tion 1.1.7 is, in effect, a definition of a grammar for the “language” (i.e., set of 
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strings) WFF. He would utilize a neat notation,”” such as 
E nAl(E A B)|(Ev B)|(E Es E)|(E = B)|(-8) 


A 2=T|L[plalr 


P|... 


where F stands for (Boolean) Expression, A for Atom, “::=” is read as “is defined 
to be” and a is read as “or”, separating alternatives in an “is defined to be”-list. 


Thus, the first line says, in English, “A (Boolean) expression is defined to be an 
atom, or ‘(’ followed by an expression, followed by ‘A’ followed by an expression 
followed by ‘)’, or, etc.” 

The second line defines atom as any of the constants or the variables (note the 
separating or’s). 


1.2 INDUCTION ON THE COMPLEXITY OF WFF: SOME EASY 
PROPERTIES OF WFF 


Suppose now that we want to prove that every A ¢ WFF” has a “property” 2. 

The technique is to associate a natural number with each member of WFF and 
prove the property by induction on numbers. The most obvious number one may 
associate with a formula A is the formula’s complexity: 


1.2.1 Definition. (Complexity of a Formula) The complexity of a formula is the 
number of connectives—counting repetitions—-occurring in the formula. O 


1.2.2 Example. Note that we can read the complexity accurately even if we write 
formulae in least parenthesized notation. 

Every atomic formula has complexity 0. The complexities of p — q — p’, 
ap VqV s,and =p p’ V p” = (p’” = q) are 2, 3, and 5 respectively. 

Brackets do not contribute to complexity. O 


A crash course on induction. First off, let us recall what we call strong or course- 
of-values induction on the natural numbers (also known as complete induction): 
Suppose that #(n) is a property of the natural number n. To prove that #(n) 
holds for all n € N* it suffices, in principle, to prove for the arbitrary n that P(n) 
holds. 
What we mean by “arbitrary” is that we do not offer the proof of A(n) for some 
“biased” n such as n = 42, or n even, or n with 105 digits, etc. If the proof indeed 


22K nown as BNF notation, or Backus-Naur-Form notation. 

23y € y” is shorthand for the claim “x is a member of the set y”, also pronounced “x belongs to y” or 
“risiny”. 

24N denotes the set of all natural numbers {0, 1, 2, 3, ...}. Thus “for all n & N” is elegant notation that 
says “for n = 0,1,2,3,...”. 


ee 
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has not cheated by using some property of n beyond “n € N”, then our proof is 
equally valid for any n € N; we have succeeded in effect to prove A(n), for all 
neéEN. 


Now the above endeavor is not always easy. It would probably come as a 
surprise to the uninitiated that we can pull an extra assumption out of the blue 
and use it toward proving P(n), and that when all is said and done this process 
is as good as if we proved #(n) without the extra assumption! 


This out-of-the-blue assumption is that 
Pk) holds forallk <n (I) 
or, another way of putting it, that the history course-of-values of P(n), 
FP(0), A(1),..., A(n—- 1) (IT) 


holds—that is, it is a sequence of valid statements. It goes by the name induction 
hypothesis (1.H.), and the technique is that of “‘proof by strong induction”. 

A couple of comments: 

(1) As before, we still have to prove #(n) for the arbitrary n, although now we 
have the I.H. as extra help. 

(2) We note that the history, (IJ), of A(n) is empty if n = 0. Thus every proof by 
strong induction has two cases to consider: the one where the history helps, because 
it exists, i.e., when we have n > 0, and the one where the history does not help, 
because it simply does not exist, i-e., when n = 0. 

In summary, strong induction proofs have two cases: 

LS. Where n > 0 and we are helped by the I.H. ((J) or (/I) above). 1S. is an 
abbreviation for induction step. 

Basis. Where n = 0 and we are on our own! The proof for n = 0 is called the basis 
step of the induction. 

Since on occasion we will also employ “simple” induction in this book, let me 
remind the reader that in this kind of induction the I.H. is not the assumption of 
validity of the entire history, but that of just A(n — 1). As before, simple induction 
is carried out for the arbitrary n, so we need to work out two cases: when the I.H. is 
really there (n > 0) and when it is not (n = 0). The case of proving #(0) directly 
is still called the basis of the (simple) induction. 

Tradition has it that in performing simple induction the majority of users in the 
literature take as ILH. #(n) while the LS. involves proving #(n + 1). 

Correspondingly, we organize proofs of properties of formulae X, #(X), into 
two main cases (rather three, in practice; see below)—essentially carrying out a 
strong induction with the complexity n of X as a “proxy”. However, the complexity 
n of X is well hidden in the background of the argument and we do not mention it: 


(i) Case of atomic formulae (these are the only ones with complexity n = 0) where 
the proof is direct, without the benefit of the I.H. 
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(ii) Case of nonatomic formulae (corresponding to a complexity (of X) n > 0) 
where we will benefit from the J.H. that #(A) holds for all formulae A that 
are less complex than X (i.e., they have complexity k < n) 


In case (ii), if A is any formula less complex than X, we will often say that 
“the LH. applies to A”, meaning precisely that “by the L.H., P(A) holds”. 


Let us apply (i)-(ii) to obtain a framework of proofs by induction on the set of 
formulae or, aS we say more simply, by induction on formulae. 

Now, since every proper’ subformula A of X has a lesser complexity than X , the 
].H. applies on A. In particular, the I.H. applies on all the i.p. of X (the definition of 
i.p. was given in Definition 1.1.10). 


Thus, in practice, (i)-(ii) translate into the following simple framework for proofs 
by induction on formulae: 


(a) X is atomic: Give a direct proof. 
(b) X has the form (—A). Give a proof on the assumption (1.H.) that A(A) holds. 


(c) X has the form (A o B)—where o € {A, V,=, >}. Give a proof for each case 
of o on the assumption (1.H.) that #(A) and #(B) hold. 


Let us now prove a few properties of formulae by induction on formulae. 
All the “theorems” (and their corollaries) of this section are about formulae and 
their syntax. They are not theorems of logic, but are metatheorems. 


1.2.3 Theorem. Every Boolean formula A has the same number of left and right 
brackets. 


Proof. The theorem is about formulae written properly, as per Definition 1.1.5, that 
is, before our agreements to simplify bracketing are applied. 
We prove the property by induction on formulae, A. 


(1) Basis: A is atomic. Each atomic formula has 0 left and 0 right brackets. We are 
Okay. 


(2) Ahas the form (—B). The LH. applies on the less complex B. So let B have m 
left and m right brackets. Then A—i.e., (=—B)—has m + 1 of each. 
(3) A has one of the forms (B AC), (BV C), (B > C) and (B=C). 


The I.H. applies to the less complex subformulae B and C. So let them have m 
left/right and r left/right brackets respectively. Thus A has m+ r+ 1 left, and 
as many right, brackets. Oo 


That is, not the same string as X. 
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Note. A string B is a prefix of a string A iff there is a string C such that A = BC. 
The prefix is empty iff it is the empty string (i.e., it has no symbols in it; it has length 
0). It is proper iff A # B. 


1.2.4 Corollary. Any nonempty proper prefix of a Boolean expression A has more 
left than right brackets.*® 


Proof. Induction on A. 


Since A denotes an arbitrary formula, “induction on formulae” can be rephrased 
as “induction on A”. Compare with “induction on natural numbers” and “induction 
onn”. 


(1) A is atomic. Note that none of the atomic formulae has any nonempty proper 
prefixes, so we are done without lifting a finger. 


© This is an instance of a statement being “vacuously true”: The statement has 
a typical instance that says, “All nonempty proper prefixes of p have more left 
than right brackets.” Is this true? Absolutely! If you think otherwise, then show 
me just one nonempty proper prefix of p that does not have an excess of left 
brackets. You cannot, because there are no nonempty proper prefixes of p. (The 
only nonempty prefix of p is p, but this is improper.) 


(2) A has the form (—B). The I.H. applies to B. Well, let’s check the nonempty 
proper prefixes of (=B). These are (quotes not included, of course): 


(a) “(’. Okay, by inspection. 
(b) “(=”. Ditto. 


(c) “(=C”, where C’ is a nonempty proper prefix of B. By I.H. if m is the 
number of left and n the number of right brackets in C, then m > n. But the 
number of left brackets of “(=C” is m+ 1. Since m + 1 > n, we are done. 


(d) “(=B”. By 1.2.3, B has, say, k left and k right brackets. We are okay, since 
k+1>k” 


(3) A has the form (B o C)—where “o” is any of A, V,->,=. The I.H. applies on 
B and C. Well, let’s check the nonempty proper prefixes of (B o C). These are 
(quotes not included, of course): 


(i) “(’. Okay, by inspection. 


6 Corollary is jargon that mathematicians, logicians, computer scientists, philosophers—and other rea- 
soning people—use to characterize a statement that needs proof, but whose proof follows easily from 
another proved statement, or from the latter’s proof. One then speaks of “A is a corollary of B’’, meaning 
that A easily follows from B (and/or B’s proof). 

2’The LH. was not needed in this step. 
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(ii) “(D”, where D is a nonempty proper prefix of B. By LH., if m is the 
number of left and n the number of right brackets in D, then m > n. But 
the number of left brackets of “(D” is m + 1. Okay. 


(iii) “(B”. By 1.2.3, B has, say, k left and k right brackets. We are okay, since 
k+1>k. 


(iv) “(Bo”. The accounting exercise is exactly as in (iii). Okay. 


(v) “(Bo D”, where D is a nonempty proper prefix of C. By 1.2.3, B has, say, 
k left and & right brackets. By ILH., D has, say, m left and r right brackets, 
where m > r. Thus, “(Bo D” has 1 +k +m left and k +r right brackets. 
Okay! 


(vi) “(BoC”. Easy. O 


The following tells us that once a formula has been written down correctly, there 
is a unique way to understand the order in which connectives apply. 


Oe 1.2.5 Theorem. (Unique Readability) For any formula A, its immediate predeces- 
sors are uniquely determined. 


Proof. Obviously, if A is atomic, then we are okay (nothing to prove, for such 
instances of A have no i.p.). Moreover, no A can be seen (written) as both atomic 
and nonatomic.28 The former do not start with a bracket; the latter do (cf. 1.2.6). 


Suppose that A is not atomic. Is it possible to build this string as a formula in 
more than one way? 


Can A have two different sets of i.p., as listed below? (Below, when I say “we are 
okay” | mean that the answer is “no”, as the theorem claims.) 


(1) (-C) and (=D)? Well, if so, C is the same string as D (why?); so in this case 
we are okay. 


(2) (=C) and (Do E), where 0 is any of A, V, >, =? Well, no (which means we are 
okay in this case too). Why “no”? For if (=C) and (D o E) are identical strings 
(they are, supposedly, two ways to read A, remember?), then “—” must be the 
same symbol as the first symbol of D. Now the first symbol of D is one of “(” 
or an atomic symbol. None matches “—”. 


(3) (C o D) and (£ © G), where © and © are any of A, V, >, = (possibly the same 
symbol) and either C and E are different strings, or D and G are different strings, 
or both? Well, no! 


(i) If C and F are different, then, say, C' is a proper prefix of E' (of course, C 
is nonempty as well (why?)). By 1.2.4, C has more left brackets than right 
ones, but—being also a formula—it has the same number of left and right 


2850 we cannot be so hopelessly confused as to think at one time that A has no i.p. and at another time 
that it does. 


2? 
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brackets (by 1.2.3). Impossible! The other case, & being a proper prefix of 
C instead, is equally impossible. 


(ii) If C and F match, then o and o match. This forces D and G to be the same 
string, since the strings (C'o D) and (Eo G) are the same—okay again. 


Having answered “no” in all cases, we are done. Oo 


1.2.6 Exercise. Prove that the first symbol of any formula A is one of 
(1) a variable 


(2) T 

(3) 

(4) a left bracket 

Hint. Induction on formulae, or directly from an analysis of formula-calculations 
(1.1.3). QO 


1.2.7 Exercise. In footnote 20 of p. 15 we defined the concept of subformula, saying: 
“A subformula of a formula A is a substring of A that is also a formula.” 

This definition does not offer itself toward showing rigorously that, e.g., “If all the 
occurrences of a subformula B of A are replaced by the same Boolean variable, say 
p, then the string so obtained is a formula.” 


Do the following: 
(1) Try to contradict me (I said, “This definition does not offer itself toward 
showing rigorously that, e.g.,... 2”) 


(2) Regardless of how you did in (1), give an inductive definition of the concept 
subformula., 

(3) Now use (2) to prove by induction on A that “Jf all the occurrences of a 
subformula B of A are replaced by the same Boolean variable, say p, then the string 
so obtained is a formula.” O 


1.3. INDUCTIVE DEFINITIONS ON FORMULAE 


Now that we know (by 1.2.5) that we can decompose a formula uniquely into its 
constituent parts, we are comfortable with defining functions (more generally “con- 
cepts”) on formulae by induction—or recursion—on formula complexity, or as we 
rather say, “by induction—or recursion—on formulae”.2? 


This recursion will define the concept as follows: 


e (Basis) If A is atomic, we define the concept (or function) directly, depending 
on what we are trying to achieve. 


e If Ais (—B), then we “call” the definition recursively to define the concept for 
B. Depending on the nature of the concept we then, taking the presence of — 
into account, extend the concept to the entire A. 


2°Some people prefer to use the term induction for proofs, and recursion for constructions. Others do not 
mind using the term induction for either. 


ee 
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e If Ais (BoC), then we “call” the definition recursively for B and C. Depending 
on the nature of the concept, taking the presence of o € {A,V,=,—} into 
account, we extend the concept to the entire A. 


I will merely state that the above process is feasible because of 1.2.5, which allows 
us to have uniquely determined “components” of A, its i.p., on which we perform the 
“recursive calls”. 

A rigorous proof that indeed the process works, that we can effect recursive 
definitions on sets such as WFF, which themselves have been inductively defined 
(1.1.7), is beyond our aims. This subject is fully considered in [54]. 


1.3.1 Example. We consider here a simple example that shows what can happen if 
we attempt a recursive definition on a set of formulae that were defined in a manner 
that the uniqueness of i.p. was not guaranteed. 
This time we define simple arithmetic formulae, without variables. As an alphabet 
we take 
{1,2,3,+, x} (1) 


Inductively, we define the set “arithmetic formulae”, AR: 


AR is the smallest possible set of strings over the alphabet (1) that contains 
the strings of unit length, 1, 2 and 3, and, moreover, if the strings X and Y are 
in AR, then so are X + Y and X x Y. 

The strings 1, 2, and 3 are the atomic formulae of AR. 


The concept i.p. is defined on the formulae of AR in the obvious manner: The 
atomic formulae do not have any i.p., and X + Y and X x Y have, each, X and Y 
as ip. 

1+ 2x 3 is an example of a formula in AR that does not have a unique i.p. 

Indeed, according to the definition, we have two sets of i.p. here: {1,2 x 3} and 
{1 + 2,3}. 

Both i.p. sets are correct. Remember that any agreement on the priority of the 
connectives—and we entered into no such agreement—is not part of the rigorous 
definition of formula syntax for AR; thus, let us not assume that any such agreement 
is implied here! 

But why do we fear the multiplicity of i.p. sets for 1 + 2 x 3? 


Let us attempt to define an “evaluation” function, inductively, on the set AR. We 
will call it £V. Here is the “natural” definition EV: 


EV(1) =1 

EV (2) =2 

EV(3) =3 
EV(X + Y) = EV(X)+ EV(Y) 
EV(X x Y) = EV(X) x EV(Y) 
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Now, what is #V(1+ 2 x 3)? 


To answer this, we need to decompose 1 + 2 x 3 into a set of i.p. so that we can 
next do our recursive calls of E'V (see the two last cases in the definition of FV). 
Unfortunately, we obtain two distinct answers! 


First we compute according to the decomposition {1,2 x 3}: 


EV(1+2x 3) = EV(1) + EV(2 x 3) 
= EV(1) + (EV(2) x EV(3)) 
=1+4+(2~x 3) 
=F 


Next, let us do so according to the other decomposition, {1 + 2, 3}: 


EV(1+2x 3) = EV(142) x EV(3) 
= (EV(1) ¥ EV(2)) x EV(3) 


=(1+2)x3 
_ Oo © 


“Natural”, or “only”? I said earlier: “Here is the “natural” definition EV.” 

But is there any other? Yes, infinitely many! We must get used to the idea that 
once we define the syntax of a set of strings—of a formal language, as we say in the 
theory of computation, the language here being AR—we do not have anything more 
than the syntax, i.e., the knowledge of the “shape” of such strings. All strings in AR 
are “meaningless”, and their semantics (or interpretation) is totally up to us. The 
variety of such interpretations at our disposal is infinite. 

I wanted the above example to be immediately relevant to our existing knowledge, 
to be “natural”. That is why I gave the meaning to all the meaningless symbols of 
alphabet (1) that anyone would likely expect. 

However, a “meaningless” symbol such as “1” may stand for infinitely many 
different objects of mathematics. Staying in algebra, | will mention the following 
(infinitely many) interpretations: The symbol may be interpreted, as here, to be the 
number “one”, but also as the unit “2 x 2” matrix 


or the unit “3 x 3” matrix 


or... 
But “unit” of some sort or another is not an intrinsic meaning of “1”. The symbol 
could stand for the number 0, or 42, or for some other mathematical object. 
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Similarly, the meaningless symbol “+” can be interpreted as “plus” on numbers, 
but also as “plus” on m x n matrices (for various 7), and also as concatenation of 
strings, union of so-called regular expressions, etc. Similar comments hold for all 
the other symbols of the alphabet (1). 


There will be two main examples of recursive definitions in this section. The first 
follows the definition of the concept of state and is the inductive definition of value 
of a formula in a state. This leads to Boolean formula semantics. 

The other will be the inductive definition of substitution of a formulae into a 
variable, an operation that is central in the use of the “Leibniz rule” in proofs. 

But first, let us introduce states and Boolean semantics. 

As we said early on (p. 6), Boolean logic is a subset of the (full) logic on first 
order languages and is, mainly, a pedagogical tool,*° since it is “easy” and therefore 
its study painlessly trains us and prepares us for the study of predicate logic. 

Does it have any other use? Yes. We can imagine that the Boolean variables 
of propositional logic are “abstractions” of statements in mathematics, computer 
science, philosophy, etc. By the term abstraction of a statement I mean the assignment 
of a name—a Boolean variable—to it, purposely forgetting the intrinsic semantic 
content (i.e., what it says) of the original statement. 

As an example, we can decide the logical correctness or not of the statement 


L=N 7 LT=NVy>N (*) 


within Boolean logic by the method of abstraction, not bothering with the fact that 
most of the symbols above—i.e., Z, y, =, >, No, 8i—are not even in the alphabet V 
of propositional logic! 

Indeed, we abstract the elementary statements?! “ax = No” and “y > X,”, naming 
them, say, p and q. Since they are two distinct statements, I used two distinct Boolean 
variables. Thus (*) becomes 


p> pvq (+*) 
and, as we will soon see, it holds independently of what hides under the names p 
and q. 


Two useful observations are motivated from this example on abstraction and are 
noteworthy: 


e The “object” of propositional logic is not the study of the elementary “state- 
ments” (or “propositions”, that is, of the Boolean variables and their intrinsic 
“semantic content”. After all, variables have no intrinsic semantic content. 
Once we name through such a variable an “elementary”—i.e., connective- 
free—substatement, we turn around and forget the meaning of the original! 


To put it positively, the “object” of study is, exclusively, the Boolean connec- 
tives and their behavior. 


30B. g., advanced texts such as [45, 53] introduce predicate calculus directly and do not cover propositional 
logic. 
31 What makes them “elementary” is that they do not involve Boolean connectives. 


? 
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e The statements that we can abstract are not restricted to those mathematical 
ones (or other) that are variable-free, like “3 = 9 — 7 > 101”. Eg., the 
elementary statement ‘“‘z = No” above depends on the variable x and therefore 
whether it is intrinsically true or false is indeterminate (depending on the value 
of x). But this did not hinder our abstraction in any manner! 


Here is why: In the process of abstraction—to which we come back in detail 
in 4.1.25—the Boolean variables that we use as names do not inherit the 
semantic content of the statements they name. Thus we do not care whether 
the truth or falsehood of what a Boolean variable names can be determined or 
not! 


The only thing that matters is the Boolean structure of the original statement, 
that is, how the original statement is put together where the connectives act as 
“glue”. In the previous example, all we needed to know was that the statement 
had the Boolean structure (**). 


The semantics of Boolean formulae is defined—in the metatheory of propositional 
logic—through a process that allows us to calculate whether a formula is true or false, 
and this under certain conditions. 

Our aim below is to make precise what we mean by “conditions”, and to give the 
process according to which we calculate the “truth value” of a formula under any 
conditions. 

As you probably know from programming courses, only two values are possible 
for a formula in “classical” Aristotelian logic as well as in its descendant, mathe- 
matical logic, which we study here. These two values are true and false—which we 
collectively call truth values. 

Thus we need a set of two distinct objects, which we will find outside the alphabet 
V, in logic’s metatheory. We freeze, i.e., reserve, these two values in this volume and 
they will serve, to the last page, as our truth values. 

_ Our choice is the set {t, f} of truth values. We will pronounce t¢ as true and f as 
false. 

Some programming languages, but even books on logic, use different sets of truth 
values, such as {0, 1}. At the end of the day, neither how we write them down nor 
how we pronounce the truth values matters, as long as we have exactly two distinct 
ones! 


1.3.2 Definition. A state v*” is a function that assigns the value f ort to each Boolean 
variable, while it assigns necessarily the value f to the constant | and necessarily 
the value t to the constant T. 

We pronounce f and t “false” and “true” respectively. On the chalkboard one 
usually denotes them by f and ¢ respectively. 

If, say, the value f is assigned to q”, then we write v(q) =f. oO 


fre 
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32)” for value. Alternative letter is ““s” for state. - 
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1.3.3 Remark. (1) A state v is one of the infinitely many possible “conditions” where 
we are interested in finding the truth value of a formula and where it is possible to 
compute such a value. 

(2) A function v is, of course, a table of input/output values such as 


in out 
i f 
T +t 
p t 
q f 


where no two rows contain the same input. Disobeying this condition would result 
in ambiguity, assigning to a variable both values f and t. 

By definition, a state is an infinite table, so we cannot fit it on a page. Mathemati- 
cally, it is an infinite set of input/output ordered pairs. 


Since the truth values f and t lie outside the alphabet V (p. 9) of our logic, they 
are symbols that, despite the similarity of their pronunciation with that of the names 
of the “meaningless” formal 1 and T respectively, are different from the latter. 

In particular, neither the metasymbol f nor the metasymbol t may appear in a 
formula! 

But why the fuss of assigning the values f and t to the (formal) variables and 
constants? How does this process give a “meaning” to the variables and constants? 

Why do we need the f and t? Where do they come from? Aren’t these two 
symbols, well, just symbols? If so, what do we gain by their introduction and why are 
the , T “meaningless” while the f and t are “meaningful”? What is their understood 
meaning? 

These are good questions! Here are some answers: 


e The symbols 1, T (but also —, V, A, =, —) are “meaningless” in the sense that 
we know nothing of them besides what the axioms will lead us to know: Their 
behavior and properties are determined only by the axioms and the rules of 
writing proofs in Boolean logic. 


On the other hand, the symbols f and t (as well as the counterparts of =, V, A, 
=,—, namely, the F_, Fy etc.—see the table in 1.3.4) are directly given via 
tables in the metatheory, as part of the elementary “Boolean algebra” that 
we learn in computer programming. It turns out the properties of the latter 
faithfully track the properties of the former, and thus in a natural way provide 
a “concrete” interpretation of the former. 


e Ananalogy may shed more light on the above discussion: Think of axiomatic 
Euclidean geometry. There we learn the properties of, and interrelationships 
between, the “meaningless” concepts of point, line and plane by rigorous 
proofs that are based on the axioms and proof-writing mules, but on nothing 
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else. Yet, there is another, “naive”, kind of geometry, analytic geometry. In this 
geometry, a “point” is not something abstract whose properties we wait to learn 
from axioms; rather, it is a concrete, well-understood object of mathematics: an 
ordered pair of real numbers, (x, y). Similarly, a (planar) “line” is an algebraic 
expression on two variables x and y and real coefficients a,b,c: ax + by = c. 
One finds that properties of the abstract (or “‘meaningless”) points and lines 
of the axiomatic version are faithfully tracked by those of the corresponding 
concepts of the “naive” versions. In this sense, the points and lines of the latter 
provide a “concrete” interpretation of the points and lines of the former. 


But why interpret? Because even if we are determined to write proofs solely 
based on axioms and rigid rules of logic, it aids our motivation and ability 
to formulate such proofs if we have a “concrete” counterpart in mind, For 
example, another interpretation of axiomatic geometry is that of geometric 
drawings, “figures” as we say. You will recall from your high school years 
how helpful these figures were in your formulation of proofs in geometry. The 
geometer M. Pasch once wrote a totally figureless monograph on axiomatic 
geometry, presumably to emphasize that the figures we draw in geometry 
are only intuitive visual aids that are theoretically redundant. Yet, it seems 
that the human brain does well with some sort of assistance, be it visual or 
some other well-understood concrete representation of abstract objects, toward 
understanding and formulating abstract arguments. 


Analogously with the above, we are often motivated and aided by our knowl- 
edge of informal Boolean algebra—that is, the set {f, t} and the various oper- 
ations on it as introduced in 1.3.4 below—as we construct proofs in axiomatic 
logic. 


Interpretations of abstract concepts by concrete ones provide a powerful tool 
toward building counterexamples: The “faithful” nature of such interpretations 
means that if I can prove a statement involving abstract concepts, let us call it 
A, then its concrete counterpart, let us call it A’, is also verifiable. 


Turning this around I get a very useful observation: If I believe that A is not 
provable, J can offer indisputable evidence for this provided I can show, in the 
concrete domain, that A’ is false. This comment will make more sense later, 
in 3.1.5. 


Hmm. It appears that this discussion builds too strong a case for the import of 
the “naive” or “concrete” approach, even though early on (p. 4) I said that the 
abstract (syntactic) approach will be favored in this book. I quote: 


We will learn that correctly written proofs are finite and “check- 
able” means toward discovering mathematical “truths”. We will also 
learn via a lot of practice how to write a large variety of proofs that 
certify all sorts of useful truths of mathematics. 


The above task, writing proofs—or “programming in logic” if you 
will—ts our main aim. 
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Yet, we noted that the concrete methods track the abstract (axiomatic) approach 
faithfully. They provide motivation and aid the construction of proofs. They 
provide definitive evidence regarding the falsehood of mathematical state- 
ments. Then why bother with the axiomatic approach? When it comes to 
logic, would it not be best to work exclusively with Boolean algebra instead? 


There are a number of reasons why the axiomatic approach has attained a 
prominent status, even in the undergraduate curricula: 


(a) There is more to logic (predicate logic of Part Il) that cannot be tracked 
by Boolean algebra. For predicate logic the concrete counterparts are in 
general infinitary, i.e., deal with infinite sets and operations with infinitely 
many arguments, such as searching an infinite set to determine if an object 
belongs to it. 

By contrast, syntactic proofs of the axiomatic method continue to be finite 
processes. Where the advantage lies is clear! 

(b) The axiomatic method was introduced as a mind-focusing device: Focus 
on what matters, via axioms and rules, and discard all that is extraneous 
to our assumptions. The approach literally saved mathematics from 
the paradoxes that the purely concrete, or naive, set theory of Cantor 
introduced. 

(c) The axiomatic method makes logic—and any theory that we build upon 
logic, e.g., modern set theory, Peano number theory—a mathematical 
object, just as a programming language is a mathematical object. This 
allows us to use mathematical tools to study logic—and any mathemat- 
ical theories built upon it—as to its power, limitations, freedom from 
contradiction, etc. O 


1.3.4 Definition. (Truth Tables) There are five operations or functions, the Boolean 
functions, that take as inputs only values from the set {f,t} and produce as outputs 
only values in the same set. The symbols we choose for these functions, one symbol 
for each Boolean connective, are 


FU(2), Fy (z, y)s F(z, y) F_(z, y); Fz(z, y) 
and their behavior is fully described by the following table, known as a truth table. 


pe yl FP) | Flew) | Paley) | Fou) | Palo) | 
f f t t 


f 


t 
t 
f 
f 


Co ol 


O 


The following definition extends a state v so that it can give a Boolean value, 
hence meaning, to all formulae. Note that originally » gave a value only to atomic 
formulae. 
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In essence, the definition gives meaning to the Boolean connectives, as it is clear 
from the fact that there is a case for each connective of how to compute the value. 

Pedantry requires that the extension of v below be denoted by a different symbol, 
say 0, since, after all, the extension is a much bigger table. While the original had a 
row only for every atomic formula, the extension has a row for every formula. But 
now that you know about this quibble, we feel safe to use the same symbol, “v”’, for 
both. 


1.3.5 Definition. (Value of a Formula in a State v) Below I use the metavariable 


p’, so that the “first” equation actually represents infinitely many, one for each 
variable in the alphabet V: 


v(p) = whatever we originally assigned to p; t or f 


= Fz(v(A),»(B)) O 


The symbol “=” above is the “equals” sign of the metatheory, and means that the 
left-hand side and right-hand side values are the same (equal). It is not a formal 
symbol for (at least) two reasons: 

(1) Our Y does not include “=” 

(2) The definition above compares informal (metatheoretical) values (t and f). 


Why the above definition works is clear at the intuitive level: Lack of ambiguity in 
decomposing a nonatomic formula C, i.e., uniquely, as one of (=A), (AA B), (AV 
B),(A — B),(A = B) allow us to know how to compute a unique answer. The 
why at the technical level is beyond our reach (the demanding reader can find a proof 
in [54]). 

The convenience of truth tables can be extended to rephrase the recursive equations 
in the above definition (from the 4th equation onward). For example, the Sth equation 


is represented in table form as 
[ABIL ANB | 
| 
| 
| 
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The way we read the above is that for all possible values of the not necessarily 
atomic formulae A and B we have listed the correct value of A A B. We.do not care 
how A and B actually obtained their values. Indeed, we do not care how A and B 
are built; they can be as complex as they like. 

Here is an example of a definition of a “concept” regarding formulae, by induction 
on formulae. , 


1.3.6 Definition. (Occurrence of a Variable) We define “p occurs in A” and “p 
does not occur in A” simultaneously. 


Occel. (Atomic) p occurs in p. It does not occur in any of gq, T, |—where q is a 
variable distinct from p. 


Oce2. p occurs in (=A) iff it occurs in A. 


Occ3. p occurs in (A o B)—where o is one of A, V, >, =—iff it occurs in A or B 
or both.*3 QO 


1.3.7 Remark. We wanted to be user-friendly (which often means “‘sloppy”) in the 
first instance, and said that the above defines a “concept”: “Occurs”/“does not occur”. 
In reality, all such “concepts” that we may define by recursion on formulae are just 
functions. 

For example, this “concept” can be captured by the function “occurs(p, A)”, 
where occurs(p, A) = 0 means “p occurs in A”, and occurs(p, A) = 1 means “p 
does not occur in A”. D 


1.3.8 Remark. (Finite “Appropriate” States) A state v is by definition an infinite 
table. Intuitively, the value of a formula A in any state v should depend only on 
the values of the variables that occur in A and on no others. Thus, for any one A, 
the state could be truncated into a finite table “appropriate” just for A—defined on 
all the variables of A but undefined elsewhere—without altering the Boolean value 
of A. Such a table would have one row for each variable that occurs in A, plus the 
two rows for . and T—which in the end can be omitted as they offer no surprises. 

Our intuition is correct. Here is a proof by induction on A of the relevant statement: 
If v and v' are two states that agree on the variables of A, then v(A) = v'(A), where 
“=” is metamathematical equality on the set {f, t}. 

The proof, as it must, goes back and forth among Definitions 1.3.6 and 1.3.5 while, 
tacitly, it is also mindful at all times of how formulae are formed, going from less to 
more complex ones (Definitions 1.1.5 and 1.1.10). 

Basis. If A is atomic, then either 

(1) It is a constant, and hence v(A) = v’(A) by Definition 1.3.5, equations two 
and three, 

or 


33 Needless to say, by the “iff”, it does not occur exactly when we have both: It does not occur in A and it 
does not occur in B. 
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(2) It is p. By assumption, v(p) = v’(p). Okay! 
Complex formulae have two main shapes: 


Case where A is (~C): The I.H.** applies to C. We use it as follows: 

(1) Since (1.3.6) C and =C have precisely the same variable occurrences, it is that 
v and v’ agree on the variables of C. 

(2) By I.H. we get u(C) = v'(C), and so v(=C) = v'(=C) by 1.3.5. 


Case where A is Co D: By Definition 1.3.6 (Oce3), each of C and D have all 
their variables also occur in C o D; therefore, by assumption, v and v’ agree on all 
the variables that occur in C and D. By I.H. v(C) = v'(C) and v(D) = v'(D). 
By 1.3.5, u(C o D) = v'(C' 0 D). This concludes the proof. i) 


1.3.9 Definition. (Tautologies) Boolean logic is primarily interested in those formu- 
lae that are true (t) in all possible states. Such formulae are called tautologies and 
because of their “shape”, i.e., the way they are put together from atomic formulae, 
brackets, and connectives, are “always” true. We use the shorthand notation Frau: A 
to indicate that A is a tautology. QO 


In view of 1.3.8, when checking a formula A for tautology status, we need to 
check it only on all finite states appropriate for A. If and only if we find that its value 
in all those states is t, then it is a tautology. Thus there is a finite process to do this, 
however, one that at the present state of the art is ridiculously inefficient: To so check 
an A that has n Boolean variables (occurring in it) we need a truth table of 2” rows. 
Current research on the “P = NP?” question of Cook ((5]) converges toward the 
opinion that it is highly unlikely that we will ever have a way to check tautologyhood, 
deterministically, in a manner appreciably more efficient than constructing a truth 
table. 

This is one additional reason why we are interested in discovering tautologies in 
a different way, nondeterministically, one that allows shortcuts in the calculation by 
allowing the prover to guess the correct next step from a set of candidates, whenever 
such a choice is offered, thus avoiding having to check all possible avenues that 
offer themselves. These guesses, when possible,*> are informed by experience and 
human intuition and ingenuity and shorten the process of tautology verification. Enter 
(syntactic) “proofs” of the next section. 


1.3.10 Example. (Some tautologies) T and p — p are tautologies. The latter fol- 
lows from u(p — p) = F_,(v(p), v(p)) and the truth table on p. 29. 

How about p — q — p? First, remember that this is sloppy for (p — (q — p)). 
Thus the last connective to act is the leftmost. 


The LH. is invariably “assume the claim for all formulae less complex than A”. As such, it deserves no 
explicit mention. 

35]t is not known whether there is a fast nondeterministic algorithm that verifies every tautology. But even 
if we discover one such, it is unlikely in view of what I said above that we can eliminate the nondeterminism 
without significant speed loss. 
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Here’s the general technique: The table below has two components. The state- 
part consists of the first two columns. Each row of these columns is a finite state 
appropriate for the formula. 

The value of the formula in each state is found beneath the last-to-act connective 
in the process of 1.1.3. To compute this value, we use the values v(p) and that of 
q — p. The values of the latter are aligned under the relevant connective, the “—” 
of the subformula g — p. In general, in the value part we align the values that we 
compute under the connective that acted \ast in the subformula we are evaluating. 


The number headings (1)—(3) above indicate the order in which the columns are 
built. By the way, we have just verified that Fu: p > q — Pp. 

What about A — B — A where A, B are arbitrary formulae? Can we settle 
this question without knowing the particular ways that A, B are put together from 
variables, connectives, and brackets? 

Yes! No matter how they are put together, in any state v we have that each of A, B 
attain one of two values t or f. Thus the exact same table as the above, but this time 
using A, B and A — B — A as column headings, 


settles the question: We have Fray, A > B— A. 

We must be sure to read the above table correctly. We are not saying that we are 
assigning values to A and B (we can assign values only to variables and constants). 
We are saying that the possible pairs of values of A and B—in that order—no 
matter what state we are in, will be among the four listed in the table. Then using 
Definition 1.3.5 we fill in the last two columns of the table. O 


There are three more concepts related to tautologies that we want to introduce, but 
first some notation: 
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We use the metavariables p, q,r, qg, . .. for variables, and A, B, F, Q for formu- 
lae. What shall we use for sets of formulae? 
Convention: We denote sets of formulae by certain capital Greek letters, as a rule, 
by those that cannot be confused with Latin letters. Thus , A,X, © will always 
stand for sets of formulae (such sets may have zero, one, two, three, one million, or 
infinitely many members). Of course, we use these letters to denote such sets if either 

(i) We do not care what are the members of a set of formulae 
or 

(ii) We do care, but we are going to refer to that set over and over again in an 
argument; thus rather than, say, writing {., p, p > g, p > —q} over and over again, 
we may give it a name saying, “Let ¥ stand for {1,p,p — 4, p — -g}.">6 

By the way, we must not confuse the set {A} with its member A. These are 
different. Think in terms of types (as in programming languages): {A} is an object 
of type set while A is an object of type formula. 


1.3.11 Definition. A formula A is satisfiable iff there is at least one state v where 
v(A) = t. A set of formulae T is satisfiable iff there is at least one state v where for 
every formula A inT, v(A) = t. We say that v satisfies T. 

We say that I tautologically implies A—and write T Fay, A— iff for every state 
v that satisfies [ we must have v(A) = t. We call I’ the hypotheses (plural, in 
general) or premises of the implication, while A is the conclusion. 

We say that a formula A is unsatisfiable or a contradiction iff for every state v, 
we have v(A) = f. We say that a set T is unsatisfiable iff for every state v there is at 
least one A in’ such that v(A) = f. oO 


By convention, logicians write the simpler A tax B for the correct {A} Fran B, 
and more generally, prefer to write A;,..., An Fan B rather than the correct (but 
pedantic) {A1,..., An} Fran B. 

Note that intuitively, a tautology A is true, no questions asked, and must be 
accepted as such. On the other hand, the conclusion of a tautological implication is 
only relatively true, relative to the premises, that is: /f we accept the premises as 
true, then we must also accept the conclusion as true. 

This relativity of truth is at the heart of mathematics. For example, if we accept 
Euclid’s “Sth postulate”>” as true, then we must accept that the sum of the angles of 
any triangle equals 180 degrees. This is a relative truth,*® since Euclid’s Sth postulate 
is not an absolute truth. Accepting any one of its possible negations®® leads to a 
totally different (relative) truth regarding the sum of angles in a triangle. 


36We may also say, “Let © = {1,p,p > q,p > ~q}.” 

37{t states that through a point that lies outside a given line it is always possible to draw a unique line 
parallel to the given one. 

38Strictly speaking, this example lies beyond Boolean logic, since we need predicate logic in order to 
“speak” and do Euclidean geometry. Nevertheless it illustrates the phenomenon of relativity of truth in a 
familiar branch of mathematics. 

39Qne possible negation is that you can draw two parallels (Lobachevsky). Another is that you can draw 
no parallels at all (Riemann). 
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1.3.12 Example. We can in principle verify A;,..., An Fan B via a truth table 


We look only at those rows that have t everywhere between the two “||” vertical 
dividers. We must ensure that these rows have a t under B as well. By the way, the 
A; and B being, in general, complicated formulae, we need the “p, q, ...” columns to 
the left of the leftmost || in order to compute the values of the A; and B in all states. 

In general, when checking Aj,...,An Far B, one cannot avoid building the 
whole truth table (or doing some equivalently laborious task) for the following 
reasons: 

(1) As we have said, at the present (and foreseeable) state of the art there is no 
way to check whether A is a tautology any more efficiently than it takes to build the 
whole truth table, in terms of the variables p,q,..., of A. 

(2) If we had a substantially faster algorithm for tautological implication, we could 
use it to also establish tautologyhood fast, since Fray, A iff T Faun A. 

However, in many cases of practical interest we can do better since we need to 
only check the rows that have exclusively t’s between the two || dividers, and ignore 
all the other rows. Sometimes we can identify these rows without building the whole 
table. 

For example, we can quickly see that A, B Frau A A B since whenever v( A) = 
v(B) = t we have v(A A B) = t (cf. 1.3.5). We did not compute the other three 
rows. A more substantial example is 


AV B,~AVC Fran BV C 


If this is done by the full table method we need 8 rows to compute the values of 
AV B,7~AV C, BV C for all possible ordered triples of values v( A), v(B), v(C). 
But instead, let us be clever: Okay; we merely need to compute the value u( BVC) 
on the condition that 
v(AV B)=v(-AVC)=t (1) 


We analyze (1) according to two cases: 
(i) v(A) =f: By (1) and 1.3.5, v(B) = t. But then v(B V C) = t by 1.3.5. 
(ii) (A) = t: By (1) and 1.3.5, v(C) = t. Butthenv(BVC) =tby 1.3.5. O 


1.3.13 Example. Here is a really important example: L Fay A. Well, I need to 
ensure that for every v, where v(_L) = t, I also get v(A) = t. 

Since no v satisfies v(L) = t—i.e., there are no rows to check—I am done. 

We have seen such situations before. The statement is vacuously true. Like before, 
one can explain this by saying, “The only way to refute L Frau A is to find a v where 
v(1) = t but v(A) = f. But such a task must fail as no v satisfies v(1)=t.” O 
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Here is a simple but nice exercise that you must try: 


1.3.14 Exercise. (1) Show that Fran A iff @ Eu A. 

(2) Show that Ay,..., An Fru Biff Fru Ar ~ Ag 7... 7 An > B. 

(3) Show that T Fu B iff [ U {-B} is unsatisfiable. We note here that for two 
sets [and © the notation ! U X—called the union of T and =— denotes the set that 
contains every member of I and every member of D. D 


We conclude this section with the definition of substitution in a Boolean expression, 
and with the proof of two important properties of substitution. 

Intuitively, the symbol “A[p := B]” is shorthand that means—i.e., expands into— 
the string we get if we replace all occurrences of the variable p in A by the formula B. 
We may think of this operation as defining a function from formulae to strings: 


Input: A, p, B; output: the string denoted by A[p := B). 


In order to read the following definition of substitution correctly, we emphasize that 
the “operation” [p := B] takes place in the metatheory and has the highest priority 
against all other “formal operations”, i.e., the Boolean connectives —, A, V, >, =. 
For example, —A[p := B] means ~{ A[p := B]}, where the symbols “{, }” are here 
meta-brackets inserted to indicate the order of application of “operations”. Naturally, 
because of its placement, this operation is left-associative, so that A[p := B]|q := C] 
means { A[p := B]}[q := C]. 


1.3.15 Definition. (Substitution in Formulae) Jn what follows, “=” denotes equal- 
ity in the metatheory, here between strings. The definition states the obvious: (1) It 
handles the basis cases in the trivial manner, and (2) when A is actually built from 
ip. (cf. 1.1.10), it says that we substitute into each i.p. first, and then apply the 
connective. 

As we usually do in definitions we are careful not to use formula abbreviations. 
All brackets are present. Note the use of metavariables. 


B if A=p 

A if A = q (where p # q), or 
Alp := B] = A=T,orA=1 

(=C{p := B)}) if A= (AC) 


(C[p := B]oD[p:= B]) if A= (CoD) 


where o is one of A, V,—, =. O 


We state and prove two easy and hardly unexpected properties of substitution: 


1.3.16 Proposition. For any formulae A and B and variable p, A[p := B] is a 
(well-formed! } formula.© 


40 A proposition is a theorem—here, metatheorem—that did not quite make it to be called that. You 
see, people reserve the term theorem, or metatheorem, for the “important” or earth-shattering stuff that 
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Proof. Induction on A, keeping an eye on Definition 1.3.15. 
Basis. If A is atomic, then we get either A or B formula; okay in either case. 


Complex formulae have two main shapes: 

Ais (=C): The I.H. applies to C, thus C[p := B] is a formula. Then so is 
(=C[p := B]) by 1.1.7. 

Ais (Co D): The LH. applies to C and D, thus C|p := B] and D[p := B] both 
are formulae. Then so is (C[p := B] o D[p := B)), again by 1.1.7. im 


1.3.17 Proposition. /f p does not occur in A, then A[p := B| = A, where for 
convenience we once more use “=” as metamathematical equality of strings. 


Proof. Again, by induction on formulae A: 
Basis. A can only be one of q (q # p), T, or L, by Definition 1.3.6. Then, by 
Definition 1.3.15 (2nd case), A[p := B] = A. Okay, so far. 


Complex formulae have two main shapes: 


A is (=C): Does p occur in C? No, by assumption (it does not occur in A) and 
definition of occurrence (1.3.6). Now, the I.H. applies to C, thus C[p := B] = C. 
We are done by 1.3.15, case 3. 


Ais (Co D): By 1.3.6 (Oce3) p occurs neither in C nor in D. The I.H. applies 
to both these subformulae; thus C[p := B] = C and D{[p := B] = D, and we are 
done by 1.3.15, case 4. oO 


1.4 PROOFS AND THEOREMS 


We are ready to develop a calculus that we may use to write down theorems. We 
will learn to “calculate theorems” just as we have learned to “calculate” ( = “parse”) 
formulae. 

Boolean logic is a (crude) vehicle through which we formulate and establish 
mathematical truth. This truth is captured absolutely (tautologies) or relatively to 
certain premises (tautological implications). Thus, when we do Boolean logic, our 
main task is to discover and verify tautologies, and more generally, to discover and 
verify tautological implications. 

We have already remarked that the presently known mechanical ways to check 
for tautology status as well as to verify tautological implications are hopelessly 
inefficient, and that there is every indication that an efficient tautology (or tautological 
implication) checker wil! never be discovered. 

One turns to utilizing human ingenuity and experience—in other words, utilizing 
(educated) guessing—toward effecting serious shortcuts in the process of certifying 
tautologies and tautological implications. This guessing (or “nondeterministic’) 


we prove. All else that we prove are just propositions (or lemmata—singular: lemma) if they have just 
“auxiliary status”, just like FORTRAN subroutines; or they are corollaries, if they follow trivially—more 
or less—from earlier results. 
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process of certifying tautologies and tautological implications is syntactic rather than 
truth table driven*! (semantic) and is called theorem proving. 

The theorems that we will learn to prove with this new syntactic technique will 
be either absolute truths (tautologies) or truths that are relative (conclusions of tau- 
tological implications) to certain assumptions that we have accepted. 

Our major concern as we are founding this syntactic proof calculus will be to ensure 
that whatever tools we utilize are capable of certifying all absolute and relative truths, 
and only those. That is, these tools will never “certify” a falsehood as a “theorem”. 
Our degree of success in implementing these requirements will be assessed later 
(Section 3.1 will assess the promise for only those, while 3.2 will assess the one for 
all). ° 

As I said before, I kind of like the terms calculate theorems and theorem calculus 
as they remind us that proving theorems is a precise syntactic (synonym for formal, 
ie., depending only on form) algorithmic process. This observation is the origin 
of the alternative name of equational logic of [11]: calculational logic. However, 
most logicians and mathematicians would proclaim that they “proved” (rather than 
“calculated”) a theorem and would rather call a theorem-calculation a proof. 


First off, a theorem-calculation or proof is a finite sequence of formulae, entirely 
analogous to formula-calculations. Each formula occurring in a proof will be called 
a theorem. 

The specifications of formula-calculations are the following two: 

(1) A formula-calculation must start with the simplest possible kind of formula: 
one that is “primitive”, i.e., atomic. 

(2) Every operation that extends the formula-calculation must preserve the prop- 
erty “formula”: either it is the trivial operation of writing down an atomic formula, 
or it is an operation that acts on previously written formulae and produces a formula. 


Entirely analogously, as it flows from the preceding discussion in this section, the 
specifications of theorem-calculations are the following two: 


(1’) A theorem calculation must start with the writing down of a formula that is 
among the simplest possible theorems—a “primitive theorem” for which the valida- 
tion process is simply to write it down! We call such a formula an axiom. 

(2') Every operation that extends the theorem-calculation must preserve the prop- 
erty “‘theorem’’; thus, it is either the trivial act of writing down another axiom, or it is 
applied to already-written theorems, resulting into a new theorem. Since a theorem 
calculation must certify truth, these nontrivial operations, or rules, must preserve 
truth. Technically, whenever they are applied to formulae A),..., A, and yield a 
formula B as a result, it is necessary that they obey Aj,...,An Fru B. 


We have two types of axioms: The logical axioms are certain well-chosen‘? 
absolute truths; therefore, they are certain tautologies. The other type we will call 
special axioms, but also assumptions or hypotheses. These are not fixed outright, but 


4'L ater there will be a weakening of this somewhat dogmatic statement. 
42The qualifier well-chosen will be revisited later. 
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may change from discussion to discussion.*? They are not deliberately chosen“ to be 
absolute truths, but are formulae that we “accept as true” (cf. discussion on relativity 
on p. 34) simply because we are interested to explore what sort of (tautological) 
conclusions we may draw from them. 

In intuitive terms—since the quest for theorems is the quest for “mathematical 
truths”, absolute or relative—the axioms are our initial truths. Our rules of reasoning 
will allow us to derive further truths from, and relative to, the axioms. 

The nontrivial operations that lengthen proofs (cf. (2’) above) are called rules of 
inference. To achieve the purely syntactic character of proofs, the rules of inference 
are applied in a manner that the input/output relation is purely syntactic. For example, 
one of the two primitive“ rules, soon to be introduced, applies to any formulae of the 
forms A and A = B and “outputs” B. The rule does not care about which specific 
formulae A or B stand for, nor about the semantics of any of A, B, A = B. The only 
thing that the rule cares about is that it “sees” as input an equivalence on one hand, 
and the first formula of this equivalence on the other. It immediately “knows” that it 
must “output” the second formula of the equivalence. 


In order to describe the rules of inference, we need formula schemata“ (or, simply, 
schemata). 

A schema is a string in the metatheory (i.e., outside logic) over the augmented 
alphabet that along with V (p. 9) includes the symbols “[”, “:=”, and “|”, and all the 
syntactic variables for formulae and Boolean variables. 

The syntactic structure of a schema is, by definition, such that if we replace ail 
the syntactic variables that occur in it by any specific formulae and variables, as 
appropriate, then the result names a formula of WFF. 


1.4.1 Definition. (Schema Instance) An instance of a schema is the formula we 
obtain if we replace all its metavariables with specific objects (formulae/Boolean 
variables) as appropriate. O 


Here are some examples of schemata: 

(1) A: It is a formula-metavariable; if we replace the letter “A” by some formula, 
well, we get that formula as the result! 

(2) (A = 8): This schema has two formula-metavariables, A and B. Whatever 
formulae we may replace A and B with, we get a formula by 1.1.7. 

(3) Alp := B]: This schema has two formula-metavariables, A and B, and 
a Boolean metavariable, p. Whatever formulae we replace A and B with, and 
whichever Boolean variable replaces p, we get a formula by 1.3.16. 


43For this reason, [ suppose, some people call them temporary assumptions. However, temporaryis not a 
technical term. For example, Euclid’s Sth postulate has no expiry date. Yet it is not an absolute truth. 
“The qualification chosen is picked purposely: I do not want to rule out as special axioms any that, either 
by accident or on purpose, have been chosen to be tautologies. Analogously, when we defined the notation 
= Frau A, quite correctly we did not forbid the case where some formulae of = may be tautologies. 
45We will encounter many derived rules. These are not “given” or postulated up front, but we prove their 
validity. 

4 Schema. plural schemata (and, incorrectly but often, schemas), is Greek for form and figure. 
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We often write the rules of inference as “fractions”, like 


Pi, Po,..-, Pr 
Q 


where all of P,,...,P,,Q are formula schemata. We call the “numerator” the 
premise (case n = 1) or premises (case n > 1) and the “denominator” the conclusion 
or result of the rule. Instead of premises we also say hypotheses or assumptions. 
The P; and Q are syntactically related so that one can mechanically check this 
input/output relation by simply looking at the form of the P; and Q. 
We have already noted an example of this mechanical applicability of a rule such 
as 


(R) 


A,A=B 
B 


The input/output relation of a rule need not be “functional”; that is, the result of a 
rule need not be uniquely determined by the hypotheses (cf. Leibniz rule below). 


It is obvious why a rule is expressed in terms of schemata rather than specific 
formulae. Schemata allow a rule to be applicable to infinitely many formulae. If 
conclusion and premises were specific formulae, then there would be just one case 
where the rule would be applicable, which would hardly qualify it to be called a rule, 
a term that creates the expectation of applicability to a vast number of cases. Here 
is an analogy: The input/output relation on numbers “in:3 / out:9” is not a rule, but 
“in: / out:2?” is. 


How a rule like (7) above is applied will be clear in Definition 1.4.5. 


Let us finally introduce the two primitive rules of Boolean logic that we will adopt 
in this volume. 


1.4.2 Definition. (Rules of Inference) The following two are our primitive or pri- 
mary rules of inference, given with the help of the syntactic variables A, B, p, C: 


Inf1 
A=B 


Cp = 4) =Cp = B] (Leibniz) 


Inf2 
A,A=B 


B (Equanimity) 


An instance of a rule of inference is obtained by replacing all the letters A, B,C’ and 
p by specific formulae and a specific variable respectively. 0 


The rule names conform to those in [17]. 


1.4.3 Remark. (1) Why primary? Do we also have “secondary rules”? Yes. We will 
soon learn that we can apply additional rules in a theorem-calculation, which are not 
mentioned in the definition of proof below (1.4.5) because they are not theoretically 


? 
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necessary toward defining proof and theorem. Such additional rules are not to be 
arbitrarily added to our toolbox without question. Instead, we will show before 
adding them, via a rigorous mathematical argument, that we are allowed to use them. 
This “allowed” means that there is nothing we can prove using these additional rules 
that we cannot prove without them. We call such additional rules derived rules, or 
secondary rules. 

Compare, once again, with programming languages. Some general-purpose pro- 
cedural languages were designed not to contain the instruction goto because it was 
considered “harmful” by some influential computer scientists in the 1970s (cf, [10], 
which, arguably, started it all)—but not by all (cf. [28]). Nevertheless, one can prove 
that goto can be simulated by the originally given instructions. It is a “derived” kind 
of instruction in those goto-less languages. 

Harmful or not, one will agree that adding one more tool does add to convenience, 
in general. 

(2) Other than the “restriction” that the A, B,C and p are metavariables of the 
agreed-upon kind, there is no other restriction on the letters. In particular, we have 
no assumption on whether p actually occurs in C’ of the Leibniz rule. Either way, it 
is all right. 

(3) We have already discussed the mechanical nature of Inf2. That also Inf1 is 
mechanically applicable is clear: Once we have written “A = B”, we can pick any 
formula C' whatsoever and any variable p and construct the output, first effecting two 
substitutions and then connecting the results with the connective “=” in the indicated 
order. 

Note that the Leibniz rule is not functional: Infinitely many different outputs are 
possible for a given input A = B. O 


We now turn to our choice of axioms. 


Schemata; again. We noted already that the axioms will be “initial truths”, and as 
such they will be selected tautologies. 

Suppose then, for the sake of discussion, that one of the tautologies of choice to 
attain axiom status is p = p. 

But how about g = q? Or how about pV p' = pV p’ and, indeed, how about 
(pV p' =pVp')=(pVp' =pvp')? 

All these have the same form, namely A = A, where A stands for an arbitrary 
formula. Naturally, if we want to include p = p, then we will want to include all 
those tautologies that have the same “shape”, A = A, as well, since surely they all 
state the same (absolute) principle: “Every statement is equivalent to itself.” If this is 
a “truth” worth postulating, then it would be absurd to do so just for one of its special 
cases, just for the variable p. 

There are two main ways to achieve this generality: 

(1) The “modern” and rather obvious way: Rather than saying, “include p = p 
and g = qandpV p' =pVp' and (pVp' = pV p’) = (pV p' = pV p’)and...”, 
we say, “include the schema A = A” 

This means include all instances of A = A (cf. 1.4.1). 
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(2) The “old” way: “Include just p = p; however, add a new primary rule of 
inference called substitution.” This rule 


A 
Ap= ol (Sub) 
when applied to the formula “‘p = p” (that is, take A to be p = p and p to be p) will 
be able to generate, by successive applications, all possible tautologies of the form 
B=B. 

There is acatch: Once you add rule (Sub) as a primary rule, you have to awkwardly 
hedge when writing proofs. The rule cannot be used in a theorem-calculation unless 
you know that the variable p does not occur in any formulae that are special axioms. 
This restriction in turn makes a very useful tool that we will soon obtain and learn to 
use—the deduction theorem—hard to state and even harder to apply. 

Thus, the old way is rightfully abandoned. In particular, we will not have any use 
for (Sub).*7 


We next present the list of logical axioms for Boolean logic. The list is infinite, 
but because of the use of schemata it can be presented by a finite table. That is, 
there are only finitely many different forms of tautologies that we need to take as our 
starting point. These are largely the ones in [17] with a few adjustments. 

What the axioms do is to codify the most basic properties of the connectives. 
The following list both presents (in partially parenthesized notation*®) and names the 
axioms. 


1.4.4 Definition. (Logical Axioms of Boolean Logic) In what follows, A, B, C de- 
note arbitrary formulae: 


Properties of = 

Associativity of = ((A = B)=C)=(A=(B=0)) (1) 

Symmetry of = (A= B)=(B=A) (2) 

Properties of 1, T 

T vs. TeLlet (3) 
Properties of = 

Introduction of — AAZAZBL (4) 
Properties of V 


47 The current edition of [17] uses the old approach in Boolean logic (their Chapter 3) but switches to the 
modern approach for predicate logic (Chapters 8 and 9). One may assume that by the time the authors 
decided that the approach with schemata is better it was too late in terms of publication deadlines to rewrite 
Chapter 3 with schemata. 

48Recall that brackets associate from right to left. 
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Associativity of V (AVB)VC=AV(BVC) (5) 
Symmetry of V AVB=BVA (6) 
Idempotency of V AVA=A (7) 
Distributivity of V Over = AV(B=EC)=AVB=AVC (8) 
Excluded Middle AvV-—A (9) 
Properties of \ 
Golden Rule ANB=A=B=AVB (10) 
Properties of — 
Implication A>~B=AVB=B (11) 


We will reserve the capital Greek letter “lambda”, A, to denote the set of all logical 
axioms. This set is, of course, infinite. O 


The axioms of A, here, however, formulated in their schemata edition, are those 
that are customary in the “equational” (or “calculational”) approach to doing logic, 
as presented, e.g., in [17] and [32], but with some minor differences—besides the 
essential one that [17] does not use schemata. For example, [17] uses T = A = A 
instead of (3). Moreover, our choice for (4) is natural, and hence easy to remember: 
It intuitively says “negating A is tantamount to saying that it is false” (A = 1). But 
[17] adopts instead a different axiom that eventually implies our (4): I quote it, but 
in schema form, “(A = B) = =A = B”. This being intuitively less clear is also 
less memorable than the rest. 

We are ready to calculate! (Compare with Definition 1.1.3 and Remark 1.1.6(2).) 


1.4.5 Definition. (Theorem-Calculations—or Proofs) Let I be an arbitrary, given 
set of formulae. 

A theorem-calculation (or proof) from T is any finite (ordered) sequence of for- 
mulae that we may write respecting the following two requirements: 


In any stage we may write down 
Pri Any member of A or 


Pr2 Any formula that appears in the denominator of an instance of a rule Inf1—Inf2 
as long as all the formulae in the numerator of the same instance of the (same) 
rule have already been written down at an earlier stage 


We may call a proof from T by the alternative name I-proof. 0 


1.4.6 Remark. (1) By definition, a T-proof is a purely form-manipulation construc- 
tion without any reference to semantic concepts, such as t, f, etc. 

(2) We will call T the set of special axioms (A contains the “general” axioms). 
Special axioms are also called hypotheses or assumptions. Clearly, while A is reserved 
and “frozen” in the first part of this book, [ can vary from subject to subject. 
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(3) Any member of [' that is not also in A we will call a nonlogical axiom.”° 

(4) Since any theorem-calculation from some I is a finite sequence of formulae, 
only a finite part of [ and A may appear in such a calculation. This is entirely 
analogous with what happens in a formula-calculation: Each uses only a finite 
number of formal variables even though we have an infinite supply of those. O 


1.4.7 Definition. (Theorems) Any formula A that appears in a [-proof is called a 
[’-theorem. We write [ A to indicate this. If T is empty (T = 0)—i.e., we have no 
special assumptions—then we simply write + A and call A just “a theorem”. 
Caution! We may also do this out of laziness and call a -theorem just “a theorem” 
if the context makes clear which [ #4 @ we have in mind. 

We say that A is an absolute, or logical theorem whenever I’ is empty. O 


(1) Clearly, the symbol “+” of the metatheory formulates the predicate “is a theorem”. 
(2) Definition 1.4.7 says that a formula is a theorem on one and only one condition: 
It occurs in some [-proof. We say that such a proof proves A from [, or from the 
hypotheses (of) T. 


In the common parlance of mathematics we may also say that “I' derives A”. 


(3) Note how in the symbol “+ A” we take A for granted and do not mention it to the 
left of F. 


1.4.8 Remark. Thus, a l’-proof of a formula A is a sequence of formulae 
By,..., Bn, A,Ci,..-,Cm 


obeying the requirements stated in 1.4.5. It is trivial that if we discard the “tail” part 
“C1,...,Cm” of the sequence, then 


By,..., Bry A (1) 


is still a proof. The reason is that every formula in a proof is either legitimized 
outright—without reference to any other formulae—or is legitimized by reference to 
formulae to its left. Thus (1) also proves A, since A occurs in it. This technicality 
allows us to stop a proof as soon as we write down the formula that we wanted to 
prove. O 


So, 1.4.7 tells us what kind of theorems we have: 
1. Anything in A UT.°° 


2. For any formula C' and variable p, the formula C[p := A] = C[p := Bl, 
provided A = B was written down already, and therefore is a ([-) theorem. 


4°That is, it does not speak about logic itself; logic does not need this axiom in order to function properly. 
“We explained the notation “A UI” in 1.3.14 on p. 36. item (3). 


PROOFS AND THEOREMS 45 


3. B (any B), provided A = B and A were written down already and therefore 
are both ([-) theorems. 


Hey! The above is a recursive definition of ('-) theorems, and is worth recording 
(compare with Definition 1.1.7). 


1.4.9 Definition. (Theorems, Inductively) A formula E is a T-theorem iff EF fulfills 
one of Th1-Th3 below: 


Thi Fisin AUT. 


Th2 For some formula C and variable p, E is C[p := A] = C|p := Bl, and (we 
know that) A = B isa (I-) theorem. 


Th3 (We know that) A = E and A are (T-) theorems. O 


1.4.10 Remark. (Theorem vs. Metatheorem) In the expression [+ A, “A” is the 
theorem, not “+ A”. After all, a theorem by definition has to be a single formula 
that appears somewhere in some proof (1.4.7). 

So what is “[. + A” then? It is a metatheorem. It is a statement that we are 
making about our logic, about what the logic can do, if we take all the formulae in 
I as assumptions. It says, “there is a proof, which from assumptions I’ proves A.” 

But how does one establish the validity of such a meta-result as quoted immediately 
above? 

By proving within Boolean logic—according to Definition |.4.5, that is— the theorem 
A using assumptions I. 

This action does two things at once: It proves A (from T) and it also metaproves 
the statement [+ A. 

Nevertheless, people are mostly shy of such distinctions and it has become accept- 
able practice to say (by abuse of language) things like “I proved. + A” all the time. 
See also the start-up comment in the next chapter. O 


1.4.11 Exercise. So that you can check your understanding of the concepts proof 
and theorem, show (i.e., present proofs) that 

(1) AE A, for any A. 

(2) A more general form of (1): If A is a member of 2—also written “A € 5” — 
then D+ A. 

(3) B, for any axiom B. 0 


1.4.12 Remark. (Hilbert Proofs) A I-proof is also called a Hilbert proof after the 
name of the great mathematician David Hilbert, who essentially was the first serious 
proponent of the idea to use logic to do mathematics. Hilbert also helped to found 
modern logic in an axiomatic and rigorous setting, and defined the concept of proof, 
essentially as above.>! 


5' There are some inessential differences. Hilbert used a different set of logical axioms, and a single rule 
of inference. 


? 
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In practice we write a Hilbert proof vertically on the page, i.e., one formula on top 
of the other, numbering every formula. It is imperative that we provide annotations 
to explain what we are doing at every step and why. The numbering assists us in 
referring to previous formulae. 

All proofs, whether they are in the Hilbert style or in the equational style (the 
latter style will be introduced shortly), must be annotated. O 


1.4.13 Example. (Some Very Simple Annotated Hilbert Proofs) 

(a) We will verify that “A, A = B+ B” for any formulae A and B. Thus we need 
to write a formal proof of B using A and A = B as hypotheses (cf. 1.4.10). The part 
“for any formulae A and B” makes this result applicable to infinitely many instances, 
one for each specific choice of A and B. It is therefore a metatheorem schema. We 
could have said instead, “prove the schema A, A = B+ B”, a formulation where 
the part “for any formulae A and B” is redundant. 

The same comment applies to any theorems (and metatheorems) that we will prove 
where we have letters to stand for arbitrary formulae. 

Let us establish (a) now. Make sure you memorize the style! This is how you will 
write your own Hilbert proofs, Every line must be numbered and annotated! 


(1) A (hypothesis) 
(2) A= B (hypothesis) 
(3) B ((1) and (2) and Equanimity) 


Worth stating. It is clear from Definition 1.4.5 that assumptions can be scrambled. 
So we have also established A = B, At B by the very same proof. 

Can we also swap A and B in A = B? It turns out that we can. Bur the above 
proof does not address this question: after all, B = A is never mentioned. 

It is a fatal error to say, “but is not ‘=’ symmetric? I can see that it is so from the 
truth table of p. 29.” 

No; you see, we have not connected our syntactic proofs with semantics yet. Until 
such time, the only things that we may assume that we can do must directly follow 
from 1.4.5 in connection with our axioms and rules of inference. 


(b) We next (meta)prove A = Bt C[p := A] = C[p := Bl]. 


(1) A=B (hypothesis) 
(2) Clp:= A] = C[p := B] (1) and Leibniz) 


Hmm, Is there a pattern here? Indeed there is! Any tule like (72) of p. 40 leads to 
the statement of provability 


Py, P2,..-;Pr FQ (R’) 


as the proof 
Py, Po,...,Pn,Q 
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establishes. This once we have written a Hilbert proof horizontally and without 
annotation, in order to expedite the obvious. 

That this isa{P,, Po,..., P,}-proofof Qisclear: Referring to Definition 1.4.5 we 
see that writing P;, P2,..., P,, (indeed doing so in any order if we wish) is legitimate 
by Pri. Then following this by writing Q is also legitimate by an application of 
tule (2) on the previously written formulae. 


This is why in some of the literature (e.g., [43]) rules of inference are written as 
in (R’) above rather than in “fraction form”. Tradition has it that derived rules of 
inference are always written in the style (A). After all, sucha rule is a (meta)provable 
principle and says that from certain assumptions we can prove a certain conclusion. 


(c) We (meta) prove a derived rule of inference, called transitivity. It is 


A=B,B=CFA=C (Transitivity) 
Here it goes: 
(1) A=B (hypothesis) 
(2) B=C (hypothesis) 
(3) (A= B)=(AB=C) ((2) and Leibniz, denom. “A = p” where p is fresh) 
(4) A=C ((1) and (3) and equanimity) 


What’s this about “fresh”? This means that p does not occur in any of A, B,C. 
Actually, all I need here is that it just does not occur in A, but it takes less space to 
say it is fresh in the annotation, so | can fit it in one line! 

I want p not to occur in A so that when I do “(A = p)[p := B] = (A= p)[p:= 
C]” I am guaranteed that I get line (3). If p does occur in A then the substitutions 
will change A and I will not get line (3)! See also 1.3.17. 


Pause. But what if I cannot find a fresh p? Actually, I always can, since I have 
an infinite supply of variables, while only finitely many appear in A, B,C. 


(d) We next prove the theorem (schema) A = A. Note that I mentioned no 
assumptions. This is an absolute result. 

Another way to say this is “metaprove + A = A”, or if you hate to say “meta”, 
say instead, “establish thatt A = A” or “show thatt A = A”. 


(1) AVA#=A (axiom) 
(2) A=A ((1) and Leib: A[p := A V A] = Alp := A] where p is fresh) 


A few remarks: (i) We may use any logical axiom of the form “... = ---” in place 
of Av A = A in the above proof. By the way, this is our first proof where we used a 
logical axiom. 

(ii) For logical axioms our annotation will just be “axiom”. For special axioms, 
our annotation will always be “assumption” or “hypothesis”. 
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(iii) We do not need to name the axioms (idempotent, etc.) in our annotations, and 
certainly I do not want you to memorize their numbers in the list! (What if I scramble 
the list?!) 

But we must be truthful. Writing, say, ‘A = B” and annotating “axiom” will not 
get us anywhere. 

(iv) Rules must be named, and we must annotate how they are applied! In what 
follows we will make a habit of abbreviating rule names. For example, “Leibniz” 
will be Leib and “Equanimity” will be Egn. Transitivity will be Trans. 


In the next chapter we systematically prove several theorems (in almost all cases 
schemata) and metatheorems to enrich our toolbox and enhance our familiarity with 
the methodology. We also introduce the equational style of proof. O 


1.5 ADDITIONAL EXERCISES 


1, Which of the following are Boolean formulae? Why? (Do not use any general 
principles; just try to give a good reason based on the definition of formula- 
calculation (1.1.3)). 


ep 
e (p) 


ep 7d 
° (p> 4q) 
2. Are the following string sequences formula-calculations? Why? 
ep,T,(pVL),L 
e p,t,(pV L),T 
3. Give a formula-calculation for (-(@ vq> 1)) 


4. Prove either by analyzing formula-calculations or by induction on formulae that 
the string () is not a formula. 


5. Prove either by analyzing formula-calculations or by induction on formulae that 
the string ((—L)) is not a formula. 


6. True or false, and why? “If A is a formula, then so is (A).” 


7. Show by induction on formulae, or by analyzing formula-calculations, that every 
Boolean formula must contain at least one Bootean variable or one Boolean 
constant. 


? 


8. 


10. 


11. 


12. 
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Prove that the complexity of a Boolean formula—correctly written as required 
by 1.1.5—equals the number of its left brackets. 


. (a) Prove that the last symbol of a Boolean formula is never A. 


(b) Prove that the string AV never occurs as part of a Boolean formula. 


The proof of each part must be either by induction on the complexity of formulae, 
or by analyzing formula-calculations. In part (b), you may use the result of part 


(a). 


Which of the following schemata are tautologies? Show all work and remember 
that to show that a schema is not a tautology we must identify an instance of it 
that is not a tautology. 


I am not using all the brackets required by 1.1.5. 
e ((A+B)>A)—A 
e ANB-AVB 
e AVBS=AAB 
eA-B=-B—-A 
e AA(B=C)=Z=AABZ=AAC 
e AV(B=C)=AVB=AVC 
Using truth tables or truth table shortcuts, determine the validity of the following. 


Show all your work. Again, a schema is no? a tautological implication iff some 
instance of it is not. 


© PFPA 
e A,B Ew AA B 
e A A> BEun B 
e BAB Etuu A 
© PAg Faw P 


Use a truth table, or a shortcut of one, to show that 


ie (AN BAC sD) =(A>(B=(C > D))) 


13. Use a truth table, or a shortcut of one, to show that 


C, A> (B=C) Fu A> B 


14. Which of the following sets is satisfiable? 


e {A,A— B,A— -B} 
e {AV B,7~AVC,-B,7C} 
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e {AV B,-AVC,BvC} 


15. Calculate the following (show/explain all work!). 


NB. The first bullet below must be done using Definition 1.3.15, step by step. For 
the rest you are free to rely on the intuitive definition of substitution/replacement. 
Some of the replacements [ ask you to do may be illegal. If so, explain precisely 
why they are illegal and don’t do them! 


© Review priorities! @ 


e pV(q~>p)|p:=r] 

(pV q)[p := ¢ 

(pV q)[p = T] 

pV qArlq := A] (where A is some formula, we don’t care which) 


pV (qAr)[q := A] (where A is some formula, we don’t care which) 


16. If A Frau Band also B Fax A, then we say that A and B are tautologically 
equivalent. Prove that every formula is tautologically equivalent to one that 
contains no constants (1, T) and moreover, the only connectives in it are — and V. 


17. Prove that every formula is tautologically equivalent to one that contains no 


constants (1, T) and moreover, the only connectives in it are — and A. 
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Prove that every formula is tautologically equivalent to one that does not contain 
the constant T and moreover, the only connective in it is —. 


19. Let us introduce a new Boolean connective | by “A | B means =(AV B)”. Prove 
that every formula is tautologically equivalent to one that contains no constants 
(1, T) and moreover, the only connective in it is |. 


20. Let us introduce a new Boolean connective f by “A tT B means =(AA B)”. Prove 
that every formula is tautologically equivalent to one that contains no constants 
(L, T) and moreover, the only connective in it is T. 


CHAPTER 2 


THEOREMS AND METATHEOREMS 


2.1 MORE HILBERT-STYLE PROOFS 


Before we begin. A word on the use of the headings “theorem” and “metatheorem”. 
In what follows we will prove several theorems and a few metatheorems. Some of 
the theorems will be absolute (no assumptions beyond logical axioms used) and some 
will be relative. These will be tersely stated with headings like “theorem” and the 
text of the result will have the format “F A” or “+ A” respectively. The theorem 
in each case, announced by the heading “theorem”, is just “A” (cf. the discussion 
in Remark 1.4.10); thus the heading is purposely abusing terminology—but this 
conforms with normal practice. 

On occasion we will establish statements of the form “if A, then also © + B”. 
This type of result will be a metatheorem that will appear with such a heading. 


2.1.1 Metatheorem. (Hypothesis Strengthening) /f[. | A andT C A, then also 
AFA. 
Note. [ ¢ A means that every formula of T also occurs inside A. 
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Proof. Any T-proof of A is also a A-proof, for whenever the legitimacy of writing 
down a formula B in the proof is by virtue of B being in I’, then this precise step 
would also be legitimate if we were composing a A-proof, since B is also A. The 
other two legitimate reasons for writing a formula down are independent of our choice 
of assumptions (cf. 1.4.5). QD 


2.1.2 Remark. In particular, if A, then also + A for any set of formulae [’. This 
is because @ C I vacuously.*” oO 


2.1.3 Exercise. Prove the content of the above remark not as a corollary of 2.1.1 but 
directly from 1.4.7 or 1.4.9. O 


2.1.4 Metatheorem. (Transitivity of +) Suppose that we have + By,T + Ba,..., 
[+ B,. Suppose, moreover, that we also have B,,...,By, + A. ThenT’ + A. 


Proof. By assumption we have I’-proofs (cf. 1.4.8) 


We also have a B,,..., B,-proof 
(n+ 1) 


We now concatenate all proofs (1)—(n) (in any order will be fine) and append to the 
end of the result proof (n + 1) to form the sequence 


2, Bi fl teaaBalieay| oane Devens 1.1;Bnhl...,A («) 


In (*) every formula C satisfies either Case 1 or Case 2: 


Casel: Is in a sequence among (1)-(n). Thus C is either written outright (because 
it isin YU A) or because it follows from a previous formula, via Eqn or Leib. 
This “previous” consideration is localized in the sequence ((i), 2 = 1,...,n) 
where the formula belongs. 


Case2: Is in the sequence (n + 1). Thus C is either written outright (because it 
is in A or is one of the B;) or because it follows from a previous formula, 


52] trust that you cannot think of any member of @ that is not in T. 
3The boxes are inserted to improve readability. 
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localized in the sequence (n + 1), via Eqn or Leib. We would like to say that 
(*) is a -proof and rest the case. The only part above, while checking the 
sequence for legitimacy, that may bother us momentarily is the underlined 
part in Case 2 above: Writing B; outright is fine fora B,,..., B,,-proof but 
may not be for a l’-proof since we have no guarantee that B; is inT. 


No problem: Any B; that is written down in the (last) segment of 
the sequence (*) is legitimate in the [-proof context. Indeed, it was already 


legitimized as the last formula of the segment. So (*) is a '-proof, 
oO 


and thus A is a ’-theorem. 


@ 2.1.5 Remark. The above metatheorem makes derived rules of inference usable. 
While primary rules, by definition (1.4.5), are applicable to any formulae that appear 
in a proof, it was not a priori clear that this applies to derived rules such as “A = 
B,B=CltA=C”, Nowitis. Any proof that looks like 


., A=B,...,B=HC,... 


or like 
..,B=C,...,A=B,... 


can be, if it fits our purposes, continued as 


or 


respectively. O @ 
Metatheorem 2.1.4 has a very important corollary: 


2.1.6 Corollary. [fT U{A}+ Bandalsol + A, thenT | B. 


@ The notation IU {A} was introduced in 1.3.14(3). A shorter form of it often used in @ 
writings in logic is “I + A”. 


Proof. The proof of B from TU {A} utilizes, besides A, only a finite number of 
formulae from [—because every proof has a finite number of steps, so it can use 
only finitely many formulae. Say the formulae used from I are 


Cy, C2,C3,...,Cn 
Thus the proof of B is from {C), C2,C3,...,Cn, A}, thatis, 
Cy, C2,C3,...,Cn, AE B (1) 
Since for any DinT we have [ + D (cf. 1.4.11), it follows that 
TEC,,PEOQ,PECs,...,PFC, (2) 


ee 
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Statements (1), (2), and the given I+ A jointly satisfy the hypotheses of 2.1.4. Thus 
we have at once that F B. O 


2.1.7 Corollary. [fT U {A} Band alsot A, thenT' + B. 


Proof. By 2.1.1, the hypothesis + A can be replaced by. + A. Corollary 2.1.6 
concludes the argument. O 


Corollary 2.1.6 essentially says that in a proof (from I) we are allowed to write 
down, not only (1)—(3), that is, (1) any axiom, (2) any member of I, (3) any result of 
an inference rule applied to already-written-down formulae, but also we may write 
down (4) any [’-theorem (like A above). 


In other words, the corollary justifies the legitimacy of quoting or using in proofs 
already-proved theorems without having to prove them all over again every time we 
need to use them! 


2.1.8 Exercise. Show that! U {A} + Bandalso AF A, thenTUAF B. O 


All the preceding metatheorems in this section, and the exercise above, are inde- 
pendent of the choice of logical axioms and rules of inference as is clear from their 
proofs. 


oo? 


We next turn to theorems and metatheorems relating to 


2.1.9 Theorem. | (A = (B=C)) = ((A= B)=C) 
Note. This is the mirror image of axiom schema (1). 
Proof. 

(1) ((A=B)=C)=(A=(B=C)) (axiom) 

(2) ((A = B)=C)=(A=(B= c))) 

s (A= (B=C))=((AZ=B)= c)) (axiom) 

(3) (A=(B=C))=((A=B)=C) (Ul, 2)+Eqn) O 

2.1.10 Remark. The above theorem/axiom (1) pair allows us to insert brackets in 


any way we please in a chain of = signs, or not insert them at all, since it does not 
matter one way or another. 


If you believed, as you should, what I have just said above, then you may skip this 
part. Strictly speaking, the pair of 2.1.9 and axiom (1) deal with a chain of two = 
signs. The general case can be dealt with by (strong) induction on the length (i.e., 


54Recall that “A = B = C” is the least-parenthesized notation for “(A = (B = C)y)” according 
tol.1.1. 
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number of A;) in the chain. The basis (n = 3) being settled as noted, we turn to the 
case of n > 3, where we want to show that 


b Ay = ApS +++ = An = (Ai, Ady) An) (0) 


The notation “(Aj, A2,..., An)” indicates, solely for the purposes of the exposition 
here, an =-chain of A; in the order given, where brackets (other than the indicated 
outermost pair) are arbitrarily placed (we do not care how). Of course, the left-hand 
side implies that brackets are present and inserted from right to left (cf. 1.1.11). We 
have two cases pertaining to the right-hand side of (0), and these depend on where 
the Jast = was inserted (Jast in a formula-calculation of (Ai, Ag, .e+5 An), that is): 


Case 1:°° The right-hand side of (0) is (omitting outermost brackets) 
A,=D (1) 
where D denotes the =-chain (A2,..., An). By the IH. 
+ Ag =::-=A, =D (2) 


By 1.4.9, using the above in an application of Leibniz with “denominator” A, = p 
(p fresh) I obtain 


i. (A= (42 =---= An)) = (A, =D) 


which, by right associativity of =, is (0). 
Case 2: The right-hand side of (0) is (omitting outermost brackets) 


(Ai... Ae) = E (3) 


where k > 1 and E denotes (Akai; ee » An). By the comment that led to the two 
cases, E is nonempty, i.e., k < n. By the J.H. 


b Ay = (Ag = ++ = Ag) = (At, .-- Ae) (4) 


By 1.4.9, an application of Leibniz with “denominator” p = E (p fresh) yields from 
(“numerator”) (4): 


F ((Ar = (42 = = A,)) = E) = ((Ai,.-., Ae) =) (5) 
By Theorem 2.1.9, 
} (A = ((A2 = +++ = Ax) = E)) = ((Ar = (42 = +++ = Ax) = E) (6) 


55The last inserted = is the leftmost. 

56The last inserted = is nor the leftmost. 

57Same as “t Ay = ++. = Ag = (Aj,..., Aq)”. The brackets around Az = --- = Ay, are inserted 
only for emphasis; cf. 1.1.11. 
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Remembering 2.1.5, and using 1.4.9 with an application of the derived rule (Trans) 
(p. 47), (5) and (6) yield 


b (Ar = ((42 = +++ = Ag) = B)) = (Ar, Ae) =) (7) 


By Case 1, 


FAL S015 Age = Ana = An = (Ar = ((-+- = At) =F) (8) 


thus, by an application of 1.4.9 and (Trans) on (7) and (8), we get (0). 
Thus, in a chain of any number of = signs, we insert brackets merely to visually 
suggest what we have in mind, but for no other reason, as they have been shown to 


be redundant. O 


2.1.11 Theorem. (The Other Equanimity) B, A = Bl A 


Proof. 
(1) B (hypothesis) 
(2) A=B (hypothesis) 
(3) (A= B) = (B= A) (axiom) 
(4) B=A ((2, 3) + Eqn) 
(5) A ((1, 4) + Eqn) Oo 


The original primary “Eqn” says that if I assume an equivalence and the left 
formula of the equivalence, then I can conclude the right. This derived rule says 
that if I assume an equivalence and the right formula of the equivalence, then I can 
conclude the left. 

In all that follows we will call “Eqn” either the primary (Inf2) or the derived one 
without notice. 


2.1.12 Theorem. | A= A 


Proof. We have already proved this schema in 1.4.13. We have stated it here again 
because it is important. oO 


2.1.13 Exercise. We proved the above using Leib and the trick that A[p := B] 
expands to A if p does not occur in A. This time, prove the result without the trick, 
but be careful not to introduce any circularities in your proof! oO 


2.1.14 Corollary. | 1 = L 


Proof. This is a specific instance of the theorem schema above. It is one of our 
very few examples of “theorems” as opposed to (the majority that are) “theorem 
schemata”. CO 


ee 
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2.1.15 Corollary. + T 
Proof. 


QQ) Let (absolute theorem, cf. 2.1.6) 

(3) T ((1) and (2) and Eqn) D 
Worth Repeating: (I-) theorems can be inserted in any (I-) proof just like axioms 
are. This is due to Corollary 2.1.6 and has been employed above. In the above case 
r=9. 


2.1.16 Theorem. (Eqn + Leib Merged) C|p := A], A = Bt Clp:= B] 


Proof. 
(1) Clp:= A] (hypothesis) 
(2) A=B (hypothesis) 
(3) Clp:= A] =Cl|p := B] ((2) + Leib) 
(4) Clp:=B] (C1, 3) + Eqn) D 


In our annotations we will call the above derived rule, whenever used, “Eqn/Leib” or 
“Leib/Eqn”. 


2.1.17 Remark. (Equivalent Logics) A Boolean logic is determined by its lan- 
guage, its logical axioms and its rules of inference. Fixing the language, two different 
choices of the remaining tools lead to two “different logics” as we say. These are 
equivalent if and only if they have exactly the same theorems. That is, one can do 
with the first set of tools precisely what one can do with the second. 

Compare with programming languages: There is a theorem (one can encounter 
this theorem in a variety of courses, including possibly data structures, or theory of 
computation) that the pair of instructions “if ...then ...else...” and “goto...” 
have exactly the same power as the pair “if... then ...else ...” and “while ...do 
...’. That is, whatever one can do (i.e., program) using one, one can do using the 
other. ‘ 

How does one prove such results? By simulation. One proves that each set can 
simulate or “implement” the other. 

Back to logic, we just saw that Eqn and Leib can simulate “Eqn/Leib”, so whatever 
we can do (i.e., prove) using the Eqn/Leib we can also prove using Eqn and Leib—.e., 
our original logic. 

It turns out that, conversely, Eqn/Leib can simulate each of Eqn and Leib. Thus if 
we were to make an about face and drop Infl and Inf2 and adopt instead Eqn/Leib 
as our only primary rule, keeping the same logical axioms, we would get precisely 
the same theorems in the new logic as in our current logic. The new logic would be 
equivalent to the original ({32] uses Eqn/Leib as primary). 
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While we will not do that, and will continue as planned with Infl1 and Inf2 as the 
primary rules—where Eqn/Leib is just a derived rule—it is nevertheless instructive 
to see the truth of my claim. 


(I) Simulation of Eqn by Eqn/Leib: 


(1) A (hypothesis) 
(2) A=B (hypothesis) 
(3) B (1, 2) + Eqn/Leib: A is p[p := A] and B is p[p := B]) 


(It) Simulation of Leib by Eqn/Leib: Preparatory discussion: We want to 
(meta)prove A = B+ Clp := A] = C[p := B] using the axioms, but no rule other 
than Eqn/Leib. So we pick arbitrary A, B,C and p. Let us also pick a q that is not 
p and does not occur in any of A,C. Clearly, the substitution C[p := A] has the 
same result as C[p := q|[q := A].°® That is, first change p into q everywhere in C 
and, after that, change q everywhere in the resulting formula into A. Similarly, the 
substitution C[p := B] has the same result as C[p := q][q := B]. 


(1) A=B (hypothesis) 
(2) C[p:= A] = Cp := q][q := A] (theorem (2.1.12); cf. discussion above) 
(3) Clp:= A] = C[p := q|[q := B] (1, 2) + Eqn/Leib) 


In view of the preparatory remarks, the formula in line (3) is C[p := A] = C[p := B] 
as needed, and line (2) is (Clp = A] = Clip := al)la := A] while line (3) is 


(Clp = A] = C[p:= al) [q := B] as required for the proper application of 
Eqn/Leib. By this I mean the requirement that “[p := A]” and “[p := B]” each apply 
to the same entire formula, not to some part thereof. Note that the assumption on q and 
A, C guarantees that q does not occurin Cp := A] and therefore C[p := A][q := A] 
and C[p := A]|q := B] each give the same result as C[p := A]. 


Fatal Error! In the proof above, 2.1.12 was used to metaprove Leib (Inf1), yet Leib 
was used to prove + A = A (cf. (d) on p. 47). So the above is an invalid (circular) 
proof! 


Not really. it is not a circular proof because + A = A is directly provable from 
Eqn/Leib and the given axioms! (I just wanted to tease you a bit.) Here’s how: 


(1) AVA#=A (axiom) 


(2) AVA=A (axiom) 
(3) A=A (Eqn/Leib + (1,2): The “C-part” is p= A—freshp) OO @ 


58 You don’t believe me? Then do Exercise 2.1.18. 
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2.1.18 Exercise. Prove by induction on the complexity of C that if q is not p, nor 
does it occur in C, then for any formula A, C[p := A] and C[p := q][q := A] have 
the same result. O 


We conclude the section with some simple but important results. 


2.1.19 Theorem. | A= A=B=B8B 


Note. By earlier remarks that hinge on the associativity of “=” (cf. axiom (1) and 
Theorem 2.1.9 as well as Remark 2.1.10), the above can be read in various ways: 
F (A= A) =(B=B)+Az=(A=(B=B)),+ (A= A)=B)=B, 
+ A = (A = B) = B, etc. In the end, one will read it in the most convenient 
(goal-driven) way. 

Proof. Brackets below are inserted for clarity so as to drive our argument: 


(1) (A=B=B)=A (axiom) 
(2) ((A =B=B)= A) = (A =(A=Be= B)) (the same axiom) 
(3) A=(A=B=B) ((1) + Eqn) O 


2.1.20 Corollary. | | =1=B=Band-A=AZ=ZHL=EL 


Proof. Directly from the previous theorem schema, the first time making A specific 
(namely, 1), the second time making B specific. O 

Of course, there is nothing in a name, and the corollary can be reformulated as 
“+ L=l=Az=AandFAZ=AZ=L=EL”. 


2.1.21 Corollary. (Redundant True) | T= A= Aand/} AZ A=T 


Proof. 


THlel (axiom) 
(2) .=12=As A (abs. theorem) 
T=AZ=A (Trans + (I, 2)) 


As for the other one, 


(1) T=Az2A (abs. theorem above) 
(2) (T =A= A) = (A =Az= T) (axiom) 
(3) A=A=T (Eqn + (1, 2)) Oo 


2.1.22 Remark. The import of “redundant true” is mostly felt in equational proofs 
that we will introduce in the next section. These proofs exploit the rule Leibniz in 
a process of “replacing equivalents by equivalents”. Thus if we view our theorem 
schema “A = A = T” as “A = (A = T)” we see that—in terms of replacing 
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equivalents by equivalents—A is as good as A = T. Thus in an expression such as 


“A = T” the part “ = T” can be eliminated; it is “redundant” and its elimination 
simplifies the expression we are trying to prove. Conversely, it is often convenient to 
introduce “ = T” (or “T = ”) replacing A by A = T (or T = A). Oo 


We state two easy (but again, important) metatheorems that flow directly from “re- 
dundant true” (2.1.21): 


2.1.23 Metatheorem. For anyl' and A,Tt AifTF AST. 


Proof. Only if, From t+ A = A = T we get + A = A =T by hypothesis 
strengthening (2.1.1). Then, from + A,T bt A=A=T,and{A,A=A=T}+ 
A=T (Eqn) we get’ + A =T by transitivity of + (2.1.4). 

iff FromrbrA=T,TPA=A=T,and {A=T,A=A=T}EA (Eqn) 
we get + A by transitivity of + (2.1.4). oO 


2.1.24 Remark. The import of 2.1.23 lies within the special case A + A = 1:59 
Whenever we work with (special) assumptions, any such assumed formula A can 
be replaced via Leibniz by T. We will see applications of this remark in the next 
section. | 


2.1.25 Metatheorem. For anyT and A, B, iff + AandT} B, thenT | A= B. 
Proof. From + A=T,I T = B, and Trans (using 2.1.4). Oo 


2.2 EQUATIONAL-STYLE PROOFS 


In algebra and trigonometry we often prove identities by calculating, systematically 
replacing equals for equals, thus—assuming the calculation started with a known 
identity—preserving equality at every step. 

For this to work, we have an initial supply of identities (our “knowledge base”’). 
The technique may be, depending on the problem, one of the following: 

(1) Start with one side of “=” in “... = ---” and calculate (replacing equals for 
equals) until you reach the other side. These “equals for equals” are among our 
known supply of identities. 

(2) Start with the entire “... = ---” and calculate until you reach a known identity. 
Then so is the original equality as it holds iff the one we reach does. 

(3) Start with each side separately and calculate until you reach in both cases the 
same formula. 

For example, to prove 1 + (tan)? = (secxr)? you work as follows, using as 
knowledge base the identities 


fanz = (2) 


SCR LAL. 
“Recall that in algebra and trigonometry an “identity” is a formula of the type“... = ---” that is true 
for all values of the variables. So, 22 — y? = 0 is not an identity, but x2 — y? = (x + y)(z — y) is. 
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1 7 
secr = fans (i) 
(sin x)? + (cos x)? = 1 (Pythagorean Theorem) (ii7) 


Here is our calculation. Note the annotation! 


1+ (tan x)? 
= (by ()) 

1+ (sinz/cos x)? 
= (arithmetic) 

(sin x)? + (cos x)? 

(cos x)? 
= (by (#z)) 
1 

(cos x)? 
= (by (iz) 

(sec x)? 


We can profitably. mimic the above style of proof in logic. The presence of the 


Leibniz rule and the preponderance of axioms that involve “=” make this possible, 
indeed easy. 
In logic, the role of “=” is taken over by “=”. 1 cannot emphasize enough that the 


two are entirely different symbols and we will not confuse them. 


So what is an equational proof? It is a proof-layout methodology. 


(1) What it does not do: [t does not supplement, amend, or replace the concept 
of proof or theorem-caiculation of Definition 1.4.5. Nor does it do so for the concept 
of (I’-) theorem (1.4.7). Both concepts remain the same. 

Compare with algebra and trigonometry: The calculation (E) above does not 
define the concept of proof in these branches of mathematics, but it does provide a 
template for many nice proofs within the normal framework of mathematical proof: 
certainly the proof (EF) is acceptable as such. 

Compare also with programming. Some people using, say, the procedural lan- 
guage Pascal may choose to adopt the structured programming methodology and in 
particular never to use the goto instruction. Others may opt to use goto and follow 
program development by flowcharts, or indeed use a mixed approach, having their 
pie and eating it too: Do structured programming with goto (cf. [28]). 

The programs written by the three groups, for the same problems, will look dras- 
tically different. However the existence of these groups and the two methodologies 
(along with the hybrid methodology) for writing programs does not detract from 
the fact that all use the very same programming language. Pascal programs have a 
formal, i.e., syntactic, definition of their structure that is independent of how people 
plan to use them. 
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The same goes for logic. Nevertheless, quite analogously, one can dogmatically 
stick to the Hilbert style of writing and annotating proofs (the “orthodoxy” according 
to Definition 1.4.5), or instead absolutely insist on writing every single proof under 
the sun in the equational-style.®! 

Then again, the smart user of logic will accept both styles. Such a person will 
judiciously choose the best tool for the task at hand each time, be it equational style or 
Hilbert style. The more tools we allow ourselves to use, the more effective “provers” 
we will be. 

(2) What it does: In many cases it simplifies proofs by allowing a goal-driven 
approach toward the theorem. 

(3) What it is: An equational-style proof is a sequence of (I'-) theorems of the 
form 


A, = Ao, Az = Az,..-, An—1 = An, An = Andi (1) 


Each of the individual theorems A; = A;+1 must receive an independent, individual 
([-) proof in order to be allowed to appear in sequence (1). 


By an independent, individual proof I mean that this proof is external to the sequence, 
in general, and is not the result of things that we wrote to the left of A; = Aj41 in 
the sequence. Sequence (1) is not a Hilbert-style proof! 

Exactly how, and with what layout methodology, we obtained each individual proof 
for the various A; = Aj+1 is totally flexible: Some or all of these may have been 
proved by equational-style proofs. Some or all may have been proved by Hilbert-style 
proofs. 

These individually proved results, A; = Aj+1, are from our (growing) database 
of theorems, which we may use in a proof like (1) just like the results (7)—(2z7) were 
part of the database of independently obtained trigonometry facts that were used in 
our proof (£) on p. 61. 

Pause. For one last time: A theorem is a theorem is a theorem, as per 1.4.7 
regardless of how, in Hilbert style or equationally, it was proved. 


Before we take a careful look at the layout of equational proofs, which is extremely 
important, let us take out of the way the “metatheory” part. 


So, what does a sequence of equivalences like the above do for us? 


The answer is provided by the following metatheorem. By the way, our relaxed 
terminology theorem vs. metatheorem, agreed to on p. 51, would have us label special 
cases like A= B,B =C,C = Dt A= D “theorems”. What makes the following 
definitely a “metatheorem” is its dependence on n. 


2.2.1 Metatheorem. 


A, = Ao, A2 = Az,..-, An—1 = An, An = Angi’ Ai = Ans (2) 


6'This can lead to some ridiculously long, mathematically ugly proofs of trivial results. 
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Proof. This is seen by repeating the (derived) rule “Trans”. A rigorous proof is by 
induction on 7: 
Basis. Forn = 1 we want Ay = Agt A, = Ag, which we got by 1.4.11. 
Taking as I.H. the claim for 7 (it looks precisely as in (2) above) we establish the 
claim for n + 1. That is, we want: 


Ay = Ag, Ag = A3,.--, An—1 = An, An = Angi, Angi = Ante At = Ante 


(3) 
Here goes a proof of (3): 
(1) A; = Ag (hypothesis) 
(2) Ag = A3 (hypothesis) 
(rn) An = Any: (hypothesis) 
(n+1) A, =Angi (()H1{n)+ LH.) 
(n + 2) An+1 = An+2 (hypothesis) 
(n+3) Ay =Ansz2 ((n +1) + (n+ 2) + Trans) 0 


2.2.2 Corollary. In an equational proof (from assumptions T) such as (1) on p, 62 
we have PF Ay = An4i. 


Proof, By 2.2.1 and 2.1.4. Oo 


2.2.3 Corollary. /n an equational proof (from assumptions T) such as (1) on p. 62 
we have PF Ay iff 0 F Andi. 


Proof. By the previous corollary and Eqn. 0 


In practice, just as in trigonometry, if we want to prove A; (from FP), then an 
equational proof allows us to start with A, and be done as soon as we end up with 
some known [’-theorem An+1, as 2.2.3 above makes clear. Of course, we do not 
have to start an equational proof of A with A, but we may do so if this is convenient, 
or makes things more intuitively clear. 

Corollary 2.2.2 tells us that a chain like (1) (I'-) proves the equivalence A; = 
An+1- Equational proofs tend to be very natural when it comes to proving equiva- 
lences, but as 2.2.3 makes clear, they can also be used to prove formulae other than 
equivalences (A; and A,4; could be anything at all). 


2.3 EQUATIONAL PROOF LAYOUT 


To emphasize the importance of layout, we devote a section to this topic. The layout 
is vertical, just like that of Hilbert-style proofs, but rather than writing (1) of p. 62 as 


A; = Ag 
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Ag = A3 


An-1 = An 
An = An+1 


we write it in the style of () on p. 61, that is, in a first approximation, 


Ay 

= (annotation) 
Ag 

= (annotation) 


An-1 

= (annotation) 
An 

= (annotation) 


An+1 


Several remarks are in order: 


2.3.1 Remark. (1) Going from (7) to (77) we have economized on writing, improving 
readability at the same time by not repeating the “joining formulae”. By joining 
formulae | mean, for each 1, the A;,1 that occurs to the right of “=” in A; = Aj41 
and to the left of “=” in the immediately following equivalence, Aj+, = Aj+2. 

Thus, (77) implies a conjunctional use of the “=” symbol that appears in the 
leftmost column; that is, it is meant to say precisely what (7) says, i.e., Ay = Ag and 
Ao = Az and Az = Az, etc. 

In other words, layout (77) is a compact notation that depicts layout (i) and, 
therefore, e.g., we can then infer the theorem A; = A,,4; Via 2.2.1. 

This is totally consistent with normal use of symbols such as “=” and “<” in 
mathematics. In (2) (p. 61) we are using “=” conjunctionally. In an algebra course, 
when we write the short “a < 6 < c” we mean the long ‘a < b and b < c”. 

But wait a minute! “A = B = C” does not mean “A = B and B = C”. 
The symbol “=” is associative according to axiom schema (1) (cf. 1.4.4, p. 42), not 
conjunctional. 


Pause. So it is associative. But this does not preclude it from also being conjunc- 
tional, does it? It does! We will come back to this question (cf. 3.1.6). 


For now, we must take it simply on faith: “=” is not conjunctional. We get around 
this obstacle by inventing a new symbol—in the metatheory, of course!—to denote 
conjunctional equivalence. This is an informal solution just as the practice in algebra 
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where “a < b < c” means “a < b and b < c’” is informal.” That is, I will give 
neither definitions nor axioms for the new symbol, which we will denote by “<=”. 
This is our “conjunctional =” and will appear only in equational proofs and only on 
their leftmost column at that. 

Thus, “A < B <= C” means only““A = Band B=C”. 

Reference [17] uses “=” for the conjunctional “=”. As we will use “=” for formal 
equality of non-Boolean objects in the predicate calculus part of the volume, and we 
are already using it in the metatheory as the “ordinary” equals—for example, between 
strings—we prefer not to overload this symbol further with yet a third meaning. 

So “<>” it is, and (i2) becomes 


A, 

<> (annotation) 
Ag 

<> (annotation) 
A3 


: (Equational Proof Layout) 
An -1 

<> (annotation) 
An 

<> (annotation) 
Ans 


(2) Informative annotation is mandatory! The annotation, also called hints in 
some of the literature, is expected to clearly explain at each step—every A; = Aj41 
is one step®— precisely why A; = A;41 isa (I'-) theorem. There may be any of the 
following reasons: 

(a) Proved earlier (with either a Hilbert-style or an equational proof). 

(b) It is a logical axiom. 

(c) It is an assumption (from whatever “IT” we have in mind). 

(d) We just gave a proof of A; = Aj, on the spot, recorded in the annotation. 
Often such “on-the-spot” proofs are via the Leibniz rule. In this case, we must be 
clear (in the annotation) of what “A = B”-part we used (the rule’s “‘numerator’”) and 
why we are allowed to use it: 


®2There is a weird (old) programming language called “Programming Language One”—you can guess 
how old it is from its name—for short “PL/}”, which is quite likely the only general purpose programming 
language where “<” is, unlike its use in mathematics, associative. In said language something like 
2 < 4 < 2 evaluates to true for these reasons: (1) Absence of brackets means, in PL/I, that we evaluate 
from left to right. (2) Thus 2 < 4 evaluates to true. This is represented by the Boolean value “true” 
in PL/1, which a programmer writes down as “IB” (one nonzero bit). (3) PL/] handles mixed-type 
expressions eagerly and has elaborate rules that convert from type to type so operations are possible as far 
as practicable. (4) Here “1B” is converted to the number “1” so that the comparison “1B < 2” can go 
ahead. But 1 < 2 is true. 

S3Note how I wrote “A; = Aj41”. T use <> only in the leftmost column of an equational proof. 
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Pause. Unless this A = B is an axiom ora (1 -) theorem (this includes (c) above) 
we may not use it. The effect of using it outside these two cases is to introduce it as a 
new assumption. This is normally unacceptable as it changes (augments) the set of 
our original assumptions! 

When using Leibniz we must be also very clear as to what the “C’-part” is (cf. 1.4.2, 
Inf1) and state any special requirements that we may have put on p, e.g., “freshness”. 
Annotations will normally fit on one line. If not, we should put a note-mark in them 
and continue the explanation outside the body of the equational proof. For Leibniz, 
the suggested style of annotation is 


axiom if among 1.4.4 
Leib + hypothesis if inT ;“C-part”... 
theorem, by its number in this volume™ otherwise 
O 
2.4 MORE PROOFS: ENRICHING OUR TOOLBOX 
2.4.1 Theorem. | -(A = B)=-A=B 
Proof. (Equational) 
a(A = B) 
<> (axiom) 
A=B=L 
& (Leib + axiom: B = 1 = 1 = B;“C-part” is A = p; p fresh) 
A=1=B 
(Leib + axiom: A = | = —A; “C-part” is p = B; p fresh) 
“A=B QO 


2.4.2 Remark. (1) Cf. the comment following our presentation of the axioms on 
p. 43. 

(2) Note that we use the minimum of brackets necessary in proofs. In particular, 
note how the brackets around A = B were dropped as soon as they became redundant 
(end of first step above). Indeed, associativity (axiom (1)) is almost always at work; 
unmentioned. Step two above views A= B= 1asA=(B=1). 

(3) Strictly speaking, “A = | = —A” in our last annotation above is not an axiom. 
It is a (trivial) theorem proved via axiom schema (2) like this (starting with what we 
want, and ending with an axiom): 


“In homework, this is fine. In tests/exams, one should either refer to the theorem by name—if it has 
one—or state it explicitly. 
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(axiom) 
AA=ZA=zL 

Its trivial nature gave us “editorial license” to call “A = . = >A” anaxiom. This 
kind of “abuse of nomenclature” —that expresses our unconcern for permutations of 
terms in a =-chain—will persist and receive definitive and general justification in 
Remark 2.4.8. 

(4) Just as with Hilbert-style proofs we need not specify which axiom we are using 
in steps annotated as “axiom”. This should be obvious from the axiom’s form. 

(5) As is often the case, the condition “p fresh” is an expedient overkill. For 
example, in the last occurrence of the condition above, I could have given the 
(longer) condition “p does not occur in B”, and that would still work—I just wanted 
the substitutions to leave B unchanged. 

(6) Increasingly I will be omitting the condition p fresh in obvious situations such 
as the above. Oo 


A trivial adaptation of the theorem’s proof yields: 
2.4.3 Corollary. | -(A = B)=A=-B 
Proof. (Equational) 


(A = B) 
<> (axiom) 
A=B=1 
& (Leib + axiom: B = 1 = —B; “C-part” is A = p; p fresh) 
A=-B O 


2.4.4 Theorem, (Double Negation) | —=A = A 
Proof. (Equational) 


7A 
> (axiom) 
7, 
<> (Leib + axiom: ~A = A= 1; “C-part” is p = 1 ) 
AzZle1 
<> (Leib + axiom: T = | = 1; “C-part” is A = p) 
A=T 
(redundant true, i.e.,/ A= A= T) 
A O 
@ (1) General Hint: Always plan to start from the more complex side of “=” and use 


our database of results, and our rules, to get it simpler and simpler, until you reach 
the other side. 


68 THEOREMS AND METATHEOREMS 


(2) A general technique—in the context of =-chains—worth imitating is to use 
axioms (1) and (2) without notice, to bracket/unbracket, to move brackets around, 
and to rearrange the order of subformulae separated by one “=” (that occurs at 
either end of the chain). The general annotation here, without details, would be 
“axioms (1, 2) + Leib”. 

(3) Ehave also made good on the earlier promise to be less pedantic and start omitting 
far-too-obvious p fresh conditions. 


2.4.5 Theorem. | T = 71 
Proof. (Equational) 


r 
(axiom) 
Let 
& (axiom) 
aL O 
2.4.6 Corollary. | | = =T 
Proof. (Equational) 
aT 
> (Leib + 2.4.5; “C-part” is sp) 
an 
(2.4.4) 
a Oo 
2.4.7 Theorem. | A V T 
Proof. (Equational) 
AVT 
> (Leib + axiom: T = 1 = 1; “C-part” is A V p; note inserted brackets!) 
AV(L=1) 
(axiom) 
AVL=EAVL oO 


One normally does not draw attention to the obvious and leaves it unsaid in the proof: 
The last line is a theorem by 2.1.12. 


®5We soon show, in 2.4.8, that neither the hedging “separated by one” nor “(that is at either end of the 
chain)” are necessary. 
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© 2.4.8 Remark. Axioms (5) and (6) are the unsung heroes in the case of V-chains just 

like (1) and (2) are in the case of =-chains. To begin with, axiom (5)—just as it was 

the case with =-chains and axiom (1)—allows the insertion or omission of brackets 

in a V-chain in any manner we please. Of course, the proof of this fact, exactly like 

the one in 2.1.10, hinges also on theorem ({) below, which is the mirror image of 
axiom (5): 

FAV(BVC)=(AVB)vC (1) 


One proves the above exactly analogously with 2.1.9. 

Moreover, axiom (6) along with the Leibniz rule and the associativity of V allows 
you to prove that in.a chain of V-signs you can swap any two subformulae, i.e., 
F BVCVD = DVCVB and, more generally, AVBVCVDVE= 
AVDVCV BVE. Indeed, 


BVCVD 
<> (axiom (6) via (5), the latter allowing us to put brackets where we want them) 
DVBVC (*) 
> (Leib + axiom (6). “C-part” is D Vv p) 
DVCVB 


From this we can easily prove the general case: 


AVBVCVDVE 
+ (Leib + special case just proved. “C-part” is A V p V E) 
AVDVCVBVE 


Entirely similar comments hold for =-chains (due to axioms (1) and (2)). This 
can be seen by repeating the above two proofs, replacing V by = throughout, and 
replacing references to axioms (5) and (6) by references to (1) and (2). For example, 
F (B=C=D) =(D=C = B) is proved by rephrasing (*) above: 


B=C=D 

> (axiom (2) via (1), the latter allowing us to put brackets where we want them) 
D=B=C 

(Leib + axiom (2). “C-part” is D = p) 
D=Cz=B | 


The following occurs often. We might as well record it. The formulation is mindful 
of what we have just said in 2.4.8. 


2.4.9 Proposition. | (A = B) V(C=D)=AVC=BVCZ=AVD=BVD 


Proof. 


(A= B)v(C=D) 
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<> (axiom) 
(A=B)vCz=(A=B)vD 

<> (Leib + axiom; uses 2.4.8 implicitly!; “C-part” is p = (A = B) v D) 
AVC=BVC=(A=B)vD 

> (Leib + axiom; uses 2.4.8 implicitly!; “C-part” is AVC = BV C =p) 
AVC=BVC=AVD=BVD O 


2.4.10 Theorem. | AV L=A 


Proof. (Equational) Here is a case where (after some thought) we find it convenient 
to deal with the entire formula. 


AVL=A 
> (Leib + axiom: A = AV A; “C-part” is A V 1 = p) 
AVL=AVA 
(axiom) 
AV({(L=A) 
> (Leib + axiom: | = A = —A; “C-part” is A V p) 
AVAA O 


Strictly speaking, | = A = —A is a trivial theorem that axiom schema (4) yields 
via 2.4.8. Let us skip to some provable properties of —: 


2.4.11 Theorem. | A-— B=-AVB 


Proof. 


A-—B 
(axiom) 
AVB=B 
(Leib + 2.4.10; “C-part” is A V B = p) 
AVB=1VB 
(axiom) 
(A=1)vVB 
> (Leib + axiom; “C-part” is p V B) 
AAVB O 


2.4.12 Corollary. | -AVB=AVB=B 


Proof. Drop the first two lines (including the annotation) in the previous proof. O 
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2.4.13 Corollary. | A (B=C)=(A>B=A—C)® 


Proof. 
A->(B=C) 
& (2.4.11) 
AAV(B=C) 
© (axiom) 


AAVB=7AAVC 

<> (Leib + 2.4.11: “C-part” is p = ~A VC) 
A>B2=-7AVC 

> (Leib + 2.4.11: “C-part” is A  B = p) 
A>B=A-C oO 


2.4.14 Remark. (1) It is good form to apply one Leibniz at a time, hence the last 
two steps. 


(2) [17] nickname 2.4.11 as “definition of —”—even though they prove it. In the 
foundation of logic that we pursue in this volume 2.4.11 is, of course, no “definition” 
at all, but is a theorem that relates the connectives =, V,—. 

Tobe sure, there are alternative foundations of logic that employ only two primitive 
propositional connectives: V and —. All the rest of the connectives, e.g., —, are 
introduced by definitions such as 


A>BY-AVB (1) 


Such a definition says that the expression A — B is metatheoretical argot, an abbre- 
viation of ~A V B. This abbreviation introduces, as a side-effect, the metasymbol 
—. But in our foundation the formal symbol — is a primitive of V and does not get 
(re)introduced! 0 


Here is a real definition in our context. We introduce a new informal symbol—i.e., 
an abbreviation—named “#” as follows: 


2.4.15 Definition. A # B& —(A = B) Oo 


How are abbreviations-by-definition used? Answer: Expand and go! That is, if 1am 
asked to prove 
. .AXB--. 


1 prove instead 


...71(A = B)--- 


56The reader is reminded of the Boolean connectives’ priorities; cf. 1.1.11. 


72 THEOREMS AND METATHEOREMS 


Here’s a rather surprising result: 


2.4.16 Theorem. + ((A # B)# c) = (4 #(BF c)) 


Proof. Expanding, we see that we really want to prove the formula schema 


a(-(4 = B) =C) = -( = (B= c)) 


-(-4 =B)= c) 

<> (axiom) 
“(A= B)=C=L 

© (Leib + axiom; “C-part” is p = C = 1) 
A=BE1=C=tl 

< (by Remark 2.4.8) 
A=BsCz2letl 

<> (Leib + axiom; “C-part” is A = p = 1) 
A=7AB=EQ)=1 

> (axiom) 
(A= -(B=0)) oO 


Here are some results for A: 


2.4.17 Theorem. (de Morgan 1) | AA B= -(-=AV-B) 


Proof. This is a lengthy, but totally straightforward calculation, nothing to write 
home about. Starting from the “complex” side, we have: 


(7A V -B) 

= (axiom) 
~AVAB=EL 

<> (Leib + 2.4.12; “C-part” is p = 1) 
AV7A=B=7B=1 

> (Leib + axiom; “C-part” is A V —B = p—2.4.8 used) 


AV7AB=B 
= (Leib + 2.4.12; “C-part” is p = B) 
AVB=A=B 


 (axiom—with the help of 2.4.8 ) 
ANB O 
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2.4.18 Corollary. (de Morgan 2) | AV B=-(-AA-B) 


Proof. This can be proved from scratch totally by imitating the above proof and 
swapping A and V. It is more instructive (and shorter) to see that it actually follows 
from 2.4.17. 


-(3A A 7B) 
> (Leib + 2.4.17; “C-part” is sp) 
a-(=9A VB) 
> (2.4.4) 
373A V—77B 
+> (Leib + 2.4.4; “C-part” is p V =—B) 
AV--B 
<> (Leib + 2.4.4; “C-part” is A V p) 
AVB Oo 


2.4.19 Theorem. | AA A=A 


Proof. 
ANA=A 
> (axiom (and 2.4.8)) 
AVAZ=A oO 


2.4.20 Theorem. | AA T=A 
Proof. 


AATZ=A 
<> (axiom (and 2.4.8)) 
AVT=T 
<> (redundant true (2.1.21)) 
AVT O 


2.4.21 Theorem. | AA 1 = 1 


Proof. 


AAL=SL 
+ (axiom (and 2.4.8)) 
AVL=A 0 
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2.4.22 Exercise. Prove 

(DF AA(BAC)=(AAB)AC 

QFAAB=BAA 

(3) State and prove for A the results corresponding to those proved in 2.4.8 for V 
and =. Q 


Distributivity of V over A and of A over V are major tools for calculations. Again 
there is nothing tricky about proving them; we just need to persevere because the 
calculations are long. 


2.4.23 Theorem. (Distributivity: v over A and / over V) 


(i) FAV BAC =(AVB)A(AVC) 
and 
(it) FAA(BVC)=AABVAAC 


I could have written these with some redundant brackets to improve readability, but 
I thought it also a good opportunity to prompt the reader to review priorities (1.1.11, 
p. 15) 

Proof. Just as is the case with the two de Morgan “laws” (2.4.17 and 2.4.18), we 
can prove (7) from (ii) (and conversely, we can prove (72) from (i)). Alternatively, 
once one of the two is proved, a proof of the other can be extracted by systematically 
swapping V and A. Thus we only prove (7): 


(AV B)A(AVC) 

> (axiom; cf. 2.4.8 too!) 
AVBVAVC=AVB=AVC 

€> (Leib + 2.4.8; “C-part” is p= AV B= AVC) 
AVAVBVC=AVB=2AVC 

> (Leib + axiom; “C-part” isp VBVC=AVB=AVC) 
AVBVC=AVB=AVC 


Let us now take a leaf from trig-proof methodology’s book (expanding both sides 
and trying to show them equal) and expand AV BAC: 


AVBAC 

< (Leib + axiom; “C-part” is A V p; note inserted brackets!) 
AV(BVC=B=C) 

co (axiom; 2.4.8 remarks used to insert/remove brackets in any way we please Bh 
AVBVC=AV(B=C) 


5’Here we viewed “B VC = B=C”as“(BVC)=(B=C)". 
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<> (Leib + axiom; “C-part” is AV BV C = p) 

AVBVCZ=AVB=AVC 
We are done as both parts, left-hand side and right-hand side, are proved equivalent 
to the same formula. The reason that this constitutes a (single) equational proof is 
because we can write, say, the second subcalculation upside down and glue it to the 


end of the first subcalculation—not repeating AV BVC=AVB=AVC,of 
course. O 


2.4.24 Corollary. | AVB=>=C=(A->C)A(B->C) 
Proof. 


AVBA=C 
4 (2.4.11) 
(AV B)VC 
<> (Leib + 2.4.18; “C-part”: =p V C) 
a—-(-A AAB)VC 
¢> (Leib + 2.4.4; “C-part”: p VC) 
(AAA -=B)VC 
+ (2.4.23) 
(-AVC)A(-BVC) 
° pee Leib, twice, + 2.4.11) 
(A> C)A(B>C) O 


2.4.25 Corollary. | A—- BAC=(A— B)A(A-C) 
Proof. 


A-BAC 
© (2.4.11) 
7=AVBAC 
> (2.4.23) 
(AAV B)A(AAVC) 
> (obvious Leib, twice,*+ 2.4.11) 
(A B)A(A>C) Oo 


Here is a result connecting = with — and A: Intuitively, one expects that “A = B” 
is the same as “(A — B) A(B — A)”. Indeed, our toolbox allows us to prove that 
much. 


Okay: two simultaneous applications of Leibniz are allowed, if far too obvious. 


76 = THEOREMS AND METATHEOREMS 
2.4.26 Theorem. + A= B=(A— B)A(B-— A) 


Proof. Here is the routine calculation: 


A— B)A(B-—- A) 

> (Leib + axiom; “C-part” is p A (B — A)) 
AVB=B)A(B-A) 

> (Leib + axiom; “C-part” is (A V B = B) A p) 


= (Leib + axiom; “C-part” is (A V B= B) A (p= A)) 
AVB=B)A(AVB=A) 
(axiom (recall 2.4.8!)) 


AVB=B)V(AVB=A)=AVB=AVB=A=B 


( 
( 
( 
( 
(AVB=B)A(BV A=A) 
( 
( 
( 
( 
( 


= (Leib + (2.1.12, 2.1.23); 


“C-part”: (AV B= B)V (AV B= A)=p=A=B) 


(AV B=B)vV(AVB=A)=T=A=B 
€ (Leib + 2.1.21; “C-part” is p = A = B) 
(AV B=B)V(AVB=A)=A=B 


+ (Leib + 2.4.9, using 2.4.8 as needed!; “C-part” is p = A = B) 
AVBVAVB=AVBVB=AVAVB=AVB=A=B 


= (A lot of Leib and 2.4.8+ X Vv X = X all at once!) 
AVB=AVB=AVB=AVB=A=B 

> (Like step (1)) 
TSEAVB=AVB=A=B 

© (Like step (2)) 
AVB=AVB=A=B 

<> (Like step (1)) 


T=A=B 
(Like step (2)) 
A=B 


2.5 USING SPECIAL AXIOMS IN EQUATIONAL PROOFS 


(1) 


(2) 


At the heart of equational proofs where we have a nonempty I’ is Metatheorem 2.1.23. 


It states that “T+ A iff! + A = 7”, which, if A €T, implies? [+ A =T. 


SCE LALL. 
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The key observation therefore is: Under any assumptions that include the formula 
A, “A = T” is a (nonabsolute) theorem and therefore can be used in an application of 
Leibniz (as the numerator of Inf1, 1.4.2) to replace occurrences of A by T anywhere 
that we find such a replacement to serve our goals. Conversely, any occurrence of T 
may be replaced by any hypothesis A (cf. 2.5.1 (4)). 

We start with four trivial examples. 


2.5.1 Example. 
(i) A,BFAAB 
(2) AVAFA 
(3) AFAVB 
(4) AABFA 


For (1) we calculate as follows: 


AAB 

< (Leib + assumption B + 2.1.23; “C-part”: A A p) 
AAT 

& (2.4.20) 
A 


I will not normally say this, but here, for emphasis: “We are done, since A is a 
{A, B}-theorem” (of course, {A, B} is our “I”’). 
For (2) we calculate as follows: 


For (3) we calculate as follows: 


AVB 
© (Leib + assumption A + 2.1.23; “C-part”: p V B) 
TVB (cf. 2.4.7) 


Example (4) has the trickiest proof, but is still very short. We calculate as follows: 


A 
© (2.4.20) 
AAT 
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<> (Leib + assumption A A B + 2.1.23; “C-part”: A A p—2.4.22 is used below) 
AANAAB 

+ (Leib + 2.4.19; “C-part”: p A B) 
AAB oO 


Results (1) and (4) above lead to the extremely important (for Hilbert-style and 
especially resolution-style proofs) metatheorem: 


2.5.2 Metatheorem. (Splitting/Merging Hypotheses) For any formulae A,B,C 
and setT, we have TU{A, B} | Cif TU{AAB}FEC. 


Proof. (Hilbert-style) 
(1) Assume [ U {A, B} + Cand prove TU{AA B} EC. 


(1) (a finite subset of T°used to establish T U {A, B} + C) 
(2) AA B (hypothesis) 

(3) A ((2), and (4) from 2.5.1) 

(4) B {(2), and (4) from 2.5. 1) 

(5) C (hypothesis I, using (1), (3) and (4) ) 


(II) Assume TU {A A B} + Cand prove U {A, B} FC. 


(1) (a finite subset of T used to show TU {AA B}} C) 
(2) A (hypothesis) 

(3) B (hypothesis) 

(4) AAB ((2, 3), and (1) from 2.5.1) 


(5) C (hypothesis II, using (1), and (4) ) O 


The above allows us (by a trivial induction on 7) to merge any n separate hypothe- 
ses into a single one and still derive the same theorems, and conversely to split any 
assumption that is a conjunction of n conjuncts into n separate assumptions (the n 
conjuncts). That is, it generalizes to 


TU {A}, Ag,..., An} t BiffT U{A, A Ad A...A Ant EB 


The following is an extremely important derived rule for use in Hilbert-style proofs. 

We will prove it equationally. Ithas a name: Modus Ponens (MP). You will note that 

it is a stronger version of Eqn, in the sense that the hypothesis is weaker—A — B 
rather than A = B—therefore MP works harder, or is “smarter” than Eqn, as it @ 
concludes the same with weaker hypotheses. 


tn Hilbert-style proofs a line must hold exactly one formula. Here, line (1) holds finitely many from I’, 
but we gave them one line number, collectively. The box around I is in recognition that we deviated from 
the correct notation. 
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2.5.3 Theorem. (Modus Ponens) A, A — B+ B 


Proof. 


A-B 
€ (2.4.11) 
aAVB 
<> (Leib + assumption A + 2.1.23; “C-part”: =p V B) 
ATVB 
(Leib + 2.4.6; “C-part”: p V B) 
LvB 
+> (2.4.10) 
B Oo 


A generalization of MP is also extremely important in Hilbert-style proofs, especially 
those that we do by the so-called resolution technique. It is 


AVB,7~AVCEBVC 


and originated in Gentzen’s natural deduction-style proofs. It is called the cut rule 
since it “cuts out” a subformula that occurs both “positively” (A) and “negatively” 
(“A) in the two hypotheses, and then it glues what remains together, using a “V”’ as 
glue. 


2.5.4 Theorem. (Cut Rule) AV B,-AVCEBVC 


Proof. We start with a subcalculation (a lemma) analyzing the most complicated 
hypothesis, —A V C: 


AAVC 
© (2.4.12) 
AVC=C 


Since =A V C is a theorem under our assumptions, so is A V C = C and it can be 
used in the following proof of B V C' from our two assumptions: 


BVC 
> (Leib + lemma; “C-part”: B V p) 
BV (AVC) 
(2.4.8) 
(AVB)vC 
(Leib + assumption A V B + 2.1.23; “C-part”: p V C) 
TVC Oo 
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2.5.5 Corollary. AV B,-AV BE B 


Proof. By 2.5.4, wehave AV B,=AV Bt BVB. By2.5.1(2), wehave BV Bk B. 
We are done by 2.1.4. EJ 


2.5.6 Corollary. AV B,7Al B 
Proof. By 2.5.1(3), we have -A - 7A V B. We are done by 2.5.5. O 


2.5.7 Corollary. A,7=A‘ 1 
Proof. By 2.5.5, where we take B to be the specific formula _L, using 2.4.10. O 


2.5.8 Exercise. The proofs of the three previous corollaries deviated from our usual 
pedantic Hilbert style. They look more like everyday proofs that a mathematician 
might write. 

Give Hilbert-style proofs for each. | 


2.5.9 Corollary. (Transitivity of —) A— B,B—=~CELA—“C 
Proof. (Hilbert-style) 


(1) AB (assumption) 

(2) Bac (assumption) 

(3) A> B=-7AVB (2.4.11) 

(4) BoC=-BVC (2411) 

(5) -AVB ((1, 3) + Eqn) 

(6) -=BVC ((2, 4) + Eqn) 

(7) -AVC ((5, 6) + 2.5.4 (using 2.4.8)) Oo 


2.5.10 Theorem. A— C,B > DF AVB=CVD 


Proof. As in 2.5.4, analysis of the two hypotheses yields the theorems (theorems 
from the hypotheses, that is!) 

AVC=C (1) 
and 

BVD=D (2) 


Informed by the above we calculate: 


AVB=CVD 
<> (axiom + 2.4.8) 
AVCVBVD=CVD 
€> (Leib + (1); “C-part”: pV BV D = CV D) 
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CVBVD=CVvVD 
<> (Leib + (2); “C-part”: CV p=C v D) 
CvVD=CVD Oo 
2.5.11 Corollary. (Proof by Cases) A— C,B ->CLAVB—C 
Proof. By 2.5.10, A —+C,B—4+~CKFAVB-CVC. Since CVC = C isan 


axiom, we are done via an obvious application of (Lei). O 


2.5.12 Remark. (1) The name of 2.5.11 derives from what it says: To establish that 
a formula C is implied by a disjunction A V B itis sufficient to establish separately 
both cases: A— Cand B > C. 

(2) Actually there is more to it: By 2.5.2, the result in 2.5.11 is equivalent to 


(A> C)A(B>C)FKFAVB3C (*) 


However (*) is a direct result of 2.4.24 via one application of equanimity. The same 
process also proves “the other direction”: 


AVB3CH(ASC)A(B>C) 


(3) The following special case follows trivially, since —A V A is an axiom, and 
therefore =A V A = T by 2.1.21. It follows as easily from 2.5.4. We leave the 
details to the reader. oO 


2.5.13 Corollary. A > C,7A—- CEC 


2.6 THE DEDUCTION THEOREM 


The main result in this section is, strictly speaking, a Metatheorem. But the above 
title is its established nickname. It states: 


2.6.1 Metatheorem. (Deduction Theorem) /f [U{A}1 B, thenalsoT + A — B. 


Note. Due to the notational convention given on p. 53, the deduction theorem can be 
also stated as “If [+ Alt B, then also’ + A — B”. Much of the proof makes use 
of the following result. So let us prove it separately, to avoid obscuring the proof of 
the deduction theorem. 


2.6.2 Lemma. A > (B = C)+ A— (D[p := B] = D[p:=C}) 


Proof. This is a theorem schema, of course, and we have handled several such 
already. However, here we employ a new technique: We prove it not directly, but 
instead by induction on the complexity of D, in essence metaproving it, first for the 
least complex D, and then pushing the proof forward from less to more complex D.”! 


711 will be interested to know if you can come up with a direct proof. 
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That this constitutes a metaproof is clear. Our logic does not “know” induction! 
Basis. D has complexity 0: So it is one of: 


(1) p: Then we must show A — (B = C)+ A — (B = C) and we are done 
by 1.4.11. 


(2) q (other than p): Then we must show A — (B = C)F A > (q=q). Well, 
start with F q = q by 2.1.12. By (3) in 2.5.1, and 2.1.4,/ A — (q=q). We 
are done by 2.1.1. 


(3) T or: Same argument as in (2) above. 


We now take the complex case, where D has complexity n + 1, on the I.H. that 
the claim is true for all D (or whatever else you want to call them) with complexity 
nor less. Throughout the rest of the argument we will not forget that A— (B = C) 
is an assumption. 

We have several cases of how the formula, D, of complexity n + 1, was built: 


(i) Dis =E. We calculate as follows (cf. 1.3.15): 


A > (-Elp := B] = -E[p := C}) 
> (Leibniz, twice along with 2.4.1) 
A —-(E|p := B] = E[p := C)) 
+ (Leibniz + 2.4.4; “C-part”: A — q) 
A — (E|p := B] = E[p := C)) 
The last formula is an A — (B = C)-theorem by the I.H., so this is true for the top 
formula, too. 


(ii) Dis EV G. In view of 2.4.11, we have the (absolute) theorem 
(A — (D[p := B] = D[p:= c\)) = (-Av (D[p := B) = D[p:= c))) 


thus we will prove —A V (D[p := B] = D[p := C]) viathe “V over =” distributivity 
axiom, i.e., we will prove instead 


nA V D[p := B) = -AV D[p := C] 
To this end, we calculate as follows (cf. 1.3.15): 


7A V E[p := B] V G[p := B] 

+ (Leibniz + LH.; “C-part”: q V G[p := B]; 2.4.8 used as well) 
-AVG[p:= B\ Vv Elp:=C] 

<> (Leibniz + I.H.; “C-part”: q V E[p := C]; 2.4.8 used as well) 
AAV E[p:=C]VGlp:=C] 
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(iii) Dis EH AG. We calculate as follows (cf. 1.3.15): 


A- E[p := B] AG[p := B] 
€ (2.4.25) 

(A > E[p := B)) \(A > G[p := B}) 
<> (obvious Leib twice, using the 1.H.) 

(A > Ep := C]) A(A > G[p := C}) 
€ (2.4.25) 

A- E|p:= C] AG|p := C] 


(iv) Dis E — G. We calculate similarly to the “D is EV G” case (cf. also 1.3.15): 


4A V 7E[p := B] V G[p := B] 

(Leib + LH. on E + case (i); “C-part”: q V G[p := B); 2.4.8 applied implicitly) 
“AV G|p := B] v =E[p := C] 

<> (Leib + IH. on G; “C-part”: q V 7E[p := C]; 2.4.8 applied implicitly) 
3A V 7E|p := C] v G[p := C] 


Finally, 
(v) Dis E=G. Wecalculate as follows, mindful of 2.4.13 (cf. also 1.3.15): 


A ~ (E|p := B] = G[p:= B)) 
4 (2.4.13) 
A- E|p:= B]) = A—- Glp:= B] 
<> (obvious Leib + I.H. twice) 
A- E[p:=C])=A—-G|[p:=C] 
& (2.4.13) 
A-— (E|p := C] = Gl[p := C)) 0 


Proof. (Of the Deduction Theorem) This is a metatheorem about proofs (equiva- 
lently, about theorems). It says, essentially, that a proof of B from T and A must be 
somehow transformable into a proof of A > B from T alone. 

So how does one (meta)prove” a (meta)theorem like this, that a proof fromT'+ A 
is so transformable? 

By constructing the transformation by induction on the length of proof where B 
occurs (1.4.5), of course! 


?2Bven a casual observer will see that this is a proof outside Boolean logic, using such extraneous tools as 
induction. 
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© Important! A (meta)proof by induction on the length of formal proofs of our 
given logic is to use the formal definition of proof as in 1.4.5! Specifically, such 
a metaproof will deal, in the induction step, only with the two primary rules of 
inference (1.4.2) since each use of a derived rule in a formal proof can be eliminated 
by “unwinding” (replacing the invocation of) the rule itself into a formal proof, 
rigorously and completely written according to 1.4.5. 


Basis. For + A-proofs of lengthn = 1: Sucha proof consists only of B, right? 
But then, B must be one of the following (see 1.4.5!): 


(i) A. Sincet A — A (this is the same as stating | —A V A, by 2.4.11 and Eqn), 
we have that’ + A — B in this case by 2.1.1, remembering that A is B. 


(ii) In 0! U A. Then (1.4.7), 2  B. Since BF 7A V B by 2.5.1(3), we are 
done—i.e., once more P| A > B—by 2.1.4. 


We take for I.H. that the claim of the metatheorem is true for all + A-proofs of 
lengths 7 or less. 


We now look at the induction step, the case where we have a + A-proof of B, 
one that has length n + 1. Our aim is to prove that under the LH. F A — B. 


So in the very last step of this proof, we either wrote down B, or we did not: 


Case where we did not: Then B—which we assumed at the outset that this proof 
proves——must have appeared earlier. Since we can truncate a proof by dropping any 
length of tail (cf. 1.4.8) and still have a proof, it follows that (dropping, say, the last 
formula of the proof) we have a proof of B of length n or less. By the I.H. we 
conclude P + A — B in this case. 


Case where we wrote B at the very last step: Now, there are two reasons why we 
may have legitimately written B (1.4.5): 


Subcase: B is one of: A, or isinT U A. If so, then we have already argued in the 
Basis that we will then have PF A > B. 


Subcase: B was written down because we applied Eqn in the last step, soC = B 
and C have appeared in the proof earlier. By I.H. we have 


TRA SC (1) 


and 
TFA—>(C=B) 


By 2.4.13 and Eqn, the latter yields 
TFA>~C=A>B (2) 
(1) and (2) yield FF A — B by Eqn. 


Subcase: B was written down because we applied Leibniz in the last step, so B is 
Dip := C| = D\p := E| and C = E has appeared in the proof earlier. By I-H. 
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Tt A (C = E). Now Lemma 2.6.2, via 2.1.4, yields T+ A — (D[p := C] = 
D{p := E)),ie,[F A= B. Oo 


2.6.3 Corollary. [+ AF BifTF AB 


Proof. The left-to-right direction (“only if”) is 2.6.1. The “if” is by MP (2.5.3). O 


2.6.4 Remark. What is the deduction theorem good for? 

(1) The metatheoretician, who studies rather than uses logic, will proclaim: It 
shows that all the theorems that we ever need to study are absolute, for whenever we 
need to show A} B we can simply show} A — B, since the two are equivalent 
(metatheoretical) statements by the above corollary. 

As users, of logic we dismiss this clinical point of view. Indeed, even in metathe- 
oretical work the deduction theorem comes in handy—e.g., see the proof of Post’s 
theorem (3.2.1) in the next chapter. 

(2) For the user, it is exactly the opposite point of view that is important: 

To prove “+ A — B” isin general harder to do than to prove “A | B” because the 
latter asks us to prove a less complex formula than A — B—just B—and it throws in 
as a bonus an extra assumption—A—that is not available if we are proving A > B. 
An extra assumption almost always makes life easier; for it allows us to know more 
before we start the proof. 

The user will (almost) always prefer to tackle A+ B, overt A — B. Indeed, 
the deduction theorem is used extremely frequently by the practicing mathematician, 
computer scientist, logician, and any other person who reasons logically. D 


2.6.5 Exercise. Assume the deduction theorem, and prove Lemma 2.6.2 fromit. O 


2.6.6 Metatheorem. The following are equivalent: 
()TR LL. 
(2) For all A, we have | A. 
(3) For some B, we have + BA-B. 


AT such as the one in the metatheorem is called inconsistent or contradictory. AT 
that does not have the property—e.g., it fails to prove at least one formula, which is 
the negation of (2)—is called consistent. 

A formula of the form B A -=B is called a contradiction. 


Proof. We show that from (1) follows (2); from (2) follows (3); from (3) follows (1). 


(1) to (2): Let A be arbitrary. From 2.5.1(3), we have 1 + LV A. By the 
assumption and 2.1.4, we get °F 1 V A. We are done by 2.4.10 via Eqn. 

(2) to (3): Since P+ A is valid for any A, then for any”? B—since B A-Bisa 
formula—we have .| BA-B. 


™3We have proved more than we were asked to: for all B, rather than for some B. 


2 
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(3) to (1): Let B be chosen to satisfy (3). We argue that 
BAABEFL (x) 


With (x) out of the way, we are done by 2.1.4. As for (+), it follows from 2.5.2 and 
2.5.7. Oo 


2.6.7 Corollary. (Proof by Contradiction) [| A ifT!+ 7A 1 

Proof. If-part (right to left): By 2.6.1, the assumption yields 
TRFrAAS L 

But 


nA L 
© (2.4.11) 
anAVL 
© (2.4.10) 
AAA 
© (2.4.4) 
A 


Only if-part (left to right): 


(1) (a finite subset of I’, collectively numbered (1), and such that + A) 
(2) —A (assumption) 

(3) A (from (1), since we assume I+ A) 

(4). ((2,3) and 2.5.7) o 


2.7 ADDITIONAL EXERCISES 


1. Give a proof oft AVB=AV-AB=A. 

. Give a proof of AA (AV B)= A. 

. Give a proof off AVAAB= A. 

. Give a proof off ANBVAAAB=A. 

. Give a proof of A= B=(AAB)V(A~AA-B). 

. Give a proof of A (B >C).=(A— B)> (AC). 


a DAD UN Bh Ww N 


. Give a proof of A,Bt A= B. 
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. Give a direct proof of A, A | _L, not one via the cut rule. 
. Prove that A> BE CVA—CVB. 

. Prove that A > (B—>C), BF A-C. 

. Prove thatk AV(B—> A)=B-—A. 

. Provethat AVAVAt BoA. 


. Prove that if two logics have the same absolute theorems, then they have the same 
relative theorems as well; that is, for every [ and A, C proves A in one logic iff it 
does so in the other. 


. Prove (ii) of 2.4.23 as a consequence of ()—i.e., using (i) as a hypothesis. 
. Prove (i) of 2.4.23 as a consequence of (22). 

. Prove the theorem schema X — Y = ~Y > 7X, 

. Prove the theorem schema A — =B > =(A — B), 

. Prove thatt (A — B) = (;A > B) 5 B. 

. Prove that ((A — B) > A) > A. 

Prove thatk (A = C) = (B —=C) 4 (AVB-=C),. 
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CHAPTER 3 


THE INTERPLAY BETWEEN SYNTAX 
AND SEMANTICS 


We have promised that syntactic proofs provide us with an alternative (nondetermin- 
istic, that is, based on [educated] guessing) tool that we apply toward discovering 
tautological implications and, in particular, tautologies (cf. discussion at the onset of 
Section 1.4). We make good on this promise in this chapter, proving two metathe- 
orems, soundness and completeness (Post’s theorem) for Boolean logic. The first 
states that our calculus is truthful, or sound, as people say technically. That is, when- 
ever + A, then also Fran A, or more generally, if + A, then alsoT Fray A (3.1.3 
below). 

The second is a deeper result due to Post and provides the converse: If Frau A, 
then also! A, or more generally, if [ tau A, then alsol + A. 

In short, there is nothing that we can establish via truth tables that we cannot do 
syntactically (via proofs), and vice versa. 

Another way to say this is that the axioms (and rules) are well chosen (cf. discussion 
on p. 38, in particular footnote 42): 

On one hand the axioms are “true” (technically, tautologies) and the rules preserve 
truth. This yields soundness. 

On the other, the chosen axioms (and rules) are “just the right ones” to ensure that 
syntactic proofs are able to generate ail tautologies (completeness). 
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To put it colloquially, our logic not only tells the truth (soundness), but it tells the 
whole truth (completeness). 


3.1 SOUNDNESS 


Clearly, to show that we have soundness in Boolean logic, we need to show that our 
rules propagate truth, and that the logical axioms are true (tautologies). 


3.1.1 Lemma. The two primary rules of inference preserve truth. That is, 
A, A= BFE B (1) 


and 
A=B F taut C[p = A| = Clp = B] (2) 


Proof. The reader will want to review Definition 1.3.11. 

(1) Let s be a state such that s(A) = t and s(A) = s(B).” But then s(B) =t. 

(2) Let s be a state such that s(A = B) = t, i.e., s(A) = s(B). We want to show 
that s(C|p := A]) = s(C[p := B)). 

Let us view the value s(C) as the result of substitution of values from the set 
{f, t} into the variables q; of a Boolean-type “function” f(q),qQ2,...,@n), where 
the variables qi, i = 1,...,n, are precisely those that occur in C. Without loss of 
generality, say qi is p. Thus, s(C) = f (s(p), s(q2), s(q3),---8(Gn)). 

Therefore, 


s(C|p = A}) = f(s(A), s(q2), (qs), nee s(Qn) 
and 
s(Clp := B]) = f(s(B), s(q2), 8(q3),---, 8(4n) 
Using the hypothesis we get s(C[p := A]) = s(C[p := B)). Oo 


For the demanding reader who found the argument in (2) not rigorous enough here 
is a totally rigorous one, by induction on the complexity of C: We are given that 
s(A = B) =t,i.e., s(A) = s(B). 

Basis. \f Cis any of T, 1 or q (other than p), then C[p := A] = C[p := B] is 
C = C;hence s(C = C).= t since s(C) = s(C). If on the other hand C is p, then 
C|p := A] = C[p := B] is A = B; hence s(A = B) = t again. 

For the induction step, we pick an arbitrary nonatomic C’ and prove 


»(Clp := A]=C[p:= B)) =t 


74] will remind the reader that, in Part I, “=” is informal equality, outside our logic, and it is not to be 


thus—cf. the truth table on p. 29—s(A = B) = t iff s(A) = s(B). 
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that is 
s(C[p := Al) = s(C[p := B)) (1) 


on the I.H. that the claim s(Z[p := A]) = s(E[p := B)) is true for all formulae less 
complex than C. 
We have cases according to what are the i.p. of C (cf. 1.1.10). 


Casel: Cis 3D. 


Now, C[p := A] = Clp := Bj is -(D[p := A]) = -=(D[p := B)) 
(cf. 1.3.15), and therefore, by I.H., 


s(-D[p := A]) = s(-D[p := B)) iff s(D[p := A]) = s(D[p := B)) 


Case2: Let next C be D V E. The LH. applies on D and E (i.p. of C). 


Now, C[p := A] = C[p := B] is (cf. 1.3.15) D[p := A] V E[p := A] = 
D{p := B] V E[p := B]; hence we get (1) as follows: 


s(D[p := A] V Elp = Al) = Fv (s(Dip = Al), s(Elp = Al)) 
= F,(s(Dip = B)),s(Elp = B))) (by LH.) 


= s(D[p == B] v Elp := B)) 


The cases where C is any of D = E, DA E or D — E are entirely similar to the 
above and are omitted. 


3.1.2 Exercise. Using truth tables, verify that all the logical axioms (1.4.4) are 
tautologies. oO 


3.1.3 Metatheorem. (Soundness of Propositional Calculus) [+ A implies that 
r taut A. 


© The reader will observe that the proof below—the very first sentence in the Basis 


case—is oblivious to exactly what T we have in mind. Thus it is valid for any IT’, 
from empty to infinite. 


Proof. We do induction on the length of -proofs where A occurs. 


Basis. Lengthn = 1. If Ais inT, then certainly [ Frau A (cf. 1.3.11; any state 
that satisfies [ will do so for A in particular). If A is in A, then Fray A as you 
have verified in the exercise above. But thenT’ Fjau: A, since again any state s that 
satisfies [ will still make s(A) = t (any state whatsoever will make s(A) = t). 


We now assume the claim for lengths n or less (1.H.) 


Consider now the case where A occurs in a proof of length n + 1. There are 
subcases: 


ag 
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(1) A is not the last formula in the proof. So the proof ending with A—obtained 
by deleting the formulae following A (cf. 1.4.8)—has length n or less. By IH. 
T Fteut A in this case. 


(2) A is the last formula. 

(2.1) Subcase where A € TU A is handled exactly as in the Basis. 

(2.2) Subcase where A was written as a result of an application of Eqn. That is, 
the proof contains some B and also B = A, to the left of A. By LH.,T Fax B and 
T Fru B = A. Let now s be any state such that s(X) = t for all X in I. Thus, 
s(B) = t and s(B) = s(A). Hence, s(A) = t. 

(2.3) Subcase where A was written as a result of an application of Leib. So B = C 
occurs to the left of A, and A is D[p := B] = D[p := C]. By LH.,T Fun B= C. 
Let now s be any state such that s(X) = t for all X in I’. Thus, s(B) = s(C). 
Hence, s(D[p := B] = D[p := C}) = t by Lemma 3.1.1. Oo 


3.1.4 Corollary. [f+ A, then Fray A. 
Proof. TakeT = @ in the above. oO 


3.1.5 Remark. (Counterexample Constructions in Boolean Logic) In what ways 
is soundness useful? 

(1) It tells us that we are on track with our “program” to have syntactic tools to 
verify tautologies: Whatever these tools obtain (as absolute theorems) are tautologies. 

(2) It allows us to disprove fallacious “results” of the form “such-and-such formula 
is formally provable—in Boolean logic—from such-and-such assumptions”. For 
example, “p V q | p” is a false statement in the metatheory. It says that from the 
assumption p V q we can write a proof that contains (or ends with) p: Impossible! 
Why? Well, if the claim were true, then we would also have p V q Fam p. This is 
readily seen to be a false claim: Take any state s where s(p) = f and s(q) = t. 

Similarly, + L is false (because Fray, | is). By the way, we indicate that “+ A 
is false” (false metatheoretical statement) by writing [ 4A. Oo 


3.1.6 Example. We already stated on p. 64 that = is not conjunctional—which is 
why we invented its conjunctional cousin <. Here is why: 

The statement means that (1) below is not a theorem schema; i.e., for some specific 
choices of A, B, C we end up with a nontheorem. 


A=B=C2=(A=B)A(B=C) (1) 
Our job is to find specific formulae A, B, C that verify (2) below: 

PA=B=C=(A=B)A(B=C) (2) 
Let us try T, 1, and | respectively and verify (2) by showing 


Kun TS LSla(T=1)a(l=1) (3) 


? 
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The subformula T = | = to the left of the “=” connective marked with “ft” 
evaluates as t in any state. Yet, the subformula (T = 1) A (1 = 1) to the right 
evaluates as f. This verifies (3). O 


3.2 POST’S THEOREM 


3.2.1 Metatheorem. (Post’s Tautology Theorem) [fT Fran A, thenT + A. 


Proof. The proof of the metatheorem is, of course, informal and uses any tools we 
may be pleased to employ from the metatheory. 
It is most convenient to prove the contrapositive, namely, 


If FA, then ran A (1) 


Digression. The term contrapositive refers to an implication. The contrapositive of 
the formal implication “X — Y” is “~Y — —X”. It is trivial to show (exercise!) 
both 

FX —-Y=-7Y>-X 


and 
Fat X ~ Y =7Y — 7X 


Thus, in formal logic, guided by 1.4.2, 1.4.4, 1.4.5 and 1.4.7, proving X — Y is 
as good as proving + ~Y — —X (by Eqn). 

The term contrapositive also applies to all sorts of implications (including | and 
taut) in informal mathematics (metatheory). Thus, the contrapositive of “if [---], 
then (...)” is “if not (...), then not [--- |”. (Meta)proving one is as good as proving 
the other. 

Thinking commonsensically: Suppose I can prove the last of these two metastate- 
ments. 

Suppose I assume now that “[-- -]” is true (x) 


Then it also must be that “(...)” is true, for if not, then I must have the opposite: 
“not (...)” is true. As this implies not [---] it cannot be, for it contradicts my 
assumption (*). 


Returning from our digression, which introduced a commonly used term of logic, 
we embark on our proof. This will consist of a few constructions along with a few 
claims—and their (meta)proofs—about the properties of the objects we construct. 

First, let us argue that 

Claim One. There is an enumeration 


Go, Gi, Go,... (2) 
of a all formulae of Boolean logic. That is, every formula appears in the infinite 
array (2), and no string that is not a formula appears there. 


Proof. {of Claim One] We may make a retroactive adjustment to the alphabet V 
that makes it finite. This will change nothing that was said so far in this volume, 
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except a minute remark on p. 9: “Most variable symbols are formed through the 
use of ‘subsymbols’—such as 0,1, 2, ‘— that are not members of the alphabet V 
themselves”, I said there. Well, let me backtrack over this comment, now including 
0, 1, 2,3, 4, 5, 6, 7, 8, 9, ‘in an amended Y. But I am going to remove all the Boolean 
variables, except the three p,q, and r, because I am going to build all the rest! 

This idea is hardly revolutionary, and is entirely analogous to that of building the 
infinite set of natural numbers by using just two symbols, 0 and | (binary notation), or 
the infinitely many variables of a programming language such as Algol from a finite 
set of symbols. In the latter case we start with the letters a,b,...,z,A,B,...,Z 
and the digits 0,1,...,9 and use the algorithm in quotes to build any variable: “A 
variable is a string that starts with a letter and continues (to the right) as far as we 
wish, using any letter or digit.” 

So, for the purposes of this section, we take V to be (commas not included) 


p,4,7,1,0, 1, 2, 3, 4,5, 6, 7, 8,9, T,1,(,), 7,4, V, 7, = (3) 


The formation (syntax) of Boolean variables is now defined faithfully to what we 
said in Al (p. 9), but here we are more precise: 

In Al we implied that all variables are given at once—“donated” as it were— and 
gave a couple of examples. Here instead we give a variable-construction mule similar 
to the one for Algol’s variables, which will generate all Boolean variables, in essence 
giving us a new variable any time we need one: 


A Boolean variable over the alphabet V given by (3) above is a string 
that starts with one of the letters p,q,r and continues with a block of zero 
or more primes (1) and then—optionally—with a string over the subalphabet 
{0, 1,2, 3, 4, 5, 6, 7,8, 9} that does not begin with 0. 

In writing a variable we write the block of primes as a superscript and the 


block of digits as a subscript. Thus, rather than pm 123 we write, as on p. 9, 


UL 
P123- 


This view of variables facilitates the proof of Claim One. We note that we have 
not changed our set of variables, which corroborates the earlier claim that we need 
change nothing that we have said and proved so far—except for our concept of where 
variables come from. We might as well consider this “new” definition as the “firming 
up” of the one on p. 9 rather than a revision of it. 

The reader who has seen regular sets (e.g., in courses about UNIX, or theory 
of computation) would agree that the word-definition above can be captured by the 
following notation:”° 


{p,a,7}{r}* ({e} U {1,2,3,4,5, 6, 7,8,9}{0, 1,2,3, 4,5, 6, 7,8,9}") (4) 


75Here are two or three quick words about “regular sets”. Let us denote, for the benefit of this footnote. 
sets of strings by script capital Latin letters, <«/, @ etc. Then, (1) «/# names the set of all strings we can 
get by concatenating a string x € .¥ to the left of a string y € <4; (2) .«’* (the so-called “Kleene star”) 
names the set of all strings of any length (even zero) that we can build using as building blocks any finite 
number of strings from .o/; (3) «/:-4 means (</ B)E. 
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We can now build (2) simultaneously with the sequence of all strings over the 
alphabet (3). The latter sequence we build alphabetically (lexicographically) by 
listing strings by groups of increasing string-length, and within each length group 
sorting them alphabetically. For the latter task we take the order of the symbols in (3) 
as going from smaller to larger (“p” is smallest and “=” is largest). 


So how is (2) built? According to the following procedure, which runs forever: 
repeat forever: 
build the next string of the all-strings (over the alphabet (3)) sequence 
if itis aformula then write it as the next formula in sequence (2) 
[End of proof of Claim One].” Oo 


Thus the sequence of all strings over V (as in (3)!) looks like (commas not included) 
P, q, r,7,0, 1, 2,3,4, 5, 6, 7, 8, 9, aie Ate (,),7; A,V,>, =, pp, pq, pr, pr, po, pl, at 
and the first few entries of sequence (2) are 


P, qr, T,1,p7,21,Pe, eee 


These first eight are the Go, G;, G2, G3, Ga, Gs, Gg, G7 of (2), in this order. 
We now turn to the proof of (1) proper and assume the hypothesis side, 


T FA (5) 


We next construct a set of formulae, A, which is as large as possible with the 
properties that it includes I’, but also 


AKA (6) 


We build A by stages, Ap, A;, A2,... by an inductive definition, adding no more 
than one formula at each step and aiming to satisfy Claim Six below. 


Pause. The reader must have seen inductive definitions, at least of number- 
sequences: For example, for a number x # 0, the sequence of the nonnegative 
powers of r—x°, x!,2?,...—is given by x° = 1 and, forn > 0, 2"t! = x- 2”. 
Another example is the famous Fibonacci sequence defined by Fo = 0, F; = 1 and, 
forn > 1, Fnoi1 = Fa + Fa-1. 


The A,, sequence: 


Ag =. 


761f we were to assume some knowledge of set theory, then all preceding acrobatics for the proof of 
Claim One would become redundant: A string of length n > 1 over Y—whether Y is finite or not—is a 
member of the Cartesian power VY”, which by a known theorem of set theory admits an enumeration (is an 
“enumerable” set) because V does. But then, the set of all nonempty strings, i.e., J,,5, V", is enumerable 
by a known theorem of set theory, and thus so is its infinite subset WFF by yet another theorem. However, 
our original elementary proof is better in that it gives more information: Clearly, the effected enumeration 
is algorithmic. 
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For n > 0 
if A, U{G,} FA then A, U {G,} 
Anyi= (elseif A, U{-G,} FA then A, U {-=G,} 
else A, 


Thus, at each stage we add to the set that we are constructing at most one formula, 
which is a member or a negation of a member of the sequence (2). 


We define A by A = U3, An, meaning “A = Ap U A; U Ag U---”, that is, 
forming A as the set of all the members found in all the A,. 


We state and prove a few claims about the A,, sequence and about A that contains 
precisely all formulae found in all the A,,. 


Claim Two. [ C A. This follows at once from Ap = L. 


Claim Three. Forn > 0, A,, ‘A. This follows by a quick induction on n: For 
n = 0 (Basis) the claim is true by (5). On the !.H. that the claim holds for n we see 
that it so does for n + 1 by construction of A,,, (note that only the last “else” uses 
the I.H.) 


Claim Four. The last “else” case in the definition of A,,41 is never applicable. 
Indeed, the condition for that case is “A, U {G,}+ A and A, U{7=G,}F A”. By 
the deduction theorem (2.6.1) these two lead to A, /G, — A and A,}AG, — A. 
These, by 2.5.13, give A, + A, which by Claim Three cannot happen. 


So why bother having the last case? Because it is proper mathematical manners, 
when we give a definition by cases, to have all possible cases present—including the 
“otherwise” (last “else”)—to ensure that what we are defining is defined under all 
possible circumstances. It is best to check whether the definition can be simplified 
(e.g., by dropping redundant cases) only after the definition is given rather than 
analyzing it to death a priori. 


Claim Five. A {7 A. Indeed, if we think otherwise, then, since proofs have finite 
length and trivially A, C A,,4, for all n, there is an n—large enough—-so that all 
the A formulae used in the proof of AF A lie in A,. So, A, | Aas well, contrary 
to Claim Three. 


Claim Six. For every formula B, either B is in A, or —B is in A, but not both. 
Indeed, every B is some G,, in the sequence (2). By the construction of the A,, 
sequence—and since the last “else” never applies (cf. Claim Four)—we note that at 
least one of B or —B will be added to A,, to form A,,41. How about both B and 
=B being in A?” Then (2.5.7) AF 1, and hence At A by 2.6.6, which cannot be 
by Claim Five. 

Claim Seven. A is deductively closed, that is, if A+ B, then B € A. Indeed, if 
we thought for a minute that for some B it is possible to have A+ B, and yet also 


TAs some Gm and some Gy, m # k. Naturally, they cannot be inserted at the same step, since a step 
adds one formula to A. 
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have B ¢ A, then (by Claim Six) —B will be in A. The latter implies (cf. 1.4.7) that 
AF —=B, which along with A+ B and 2.5.7 yield A+ 1 and thus Al A (2.6.6), 
contradicting Claim Five. 

The previous claim may be understood as saying that A is so big a set of assump- 
tions that anything you can prove from them, with any proof, can also be proved by 
a proof of length one. 


We are ready to define a state v that verifies the conclusion of (1). 
Define a state v by setting, for each variable p, v(p) = t iff p € A. (7) 
Main Claim. For all formulae B, v(B) =tif BEA. 


Note: The claim also reads, by looking at the contrapositive (informally), “For all 
formulae B, v(B) = f iff B zg A” 


The proof is by induction on the complexity of B. We have the following cases: 
(i) B is a variable. The claim is (7). 


(ii) B is the formula T. Since v(T) = t, we want T € A. By Claim Seven, it 
suffices to have A + T. We have this by 2.1.15 and 2.1.1. 


(iii) B is the formula L. Since v() = f, we want 1 ¢ A. Well, in the opposite 
case, we would have A 1, from which, via 2.6.6, we would also have At A, 
contradicting Claim Five. 


(iv) B is nC. Say v(-C) = t. Then v(C) = f and the I.H. yields (cf. Note 
following the main claim) C ¢ A; hence —C is in A by Claim Six. 


Conversely, if -C is in A, then C ¢ A by Claim Six. By the I.H. we have 
v(C) = f; hence v(=C) = t. 


(v) Bis CV D. Say v(C V D) = t. There are two cases, but we deal with one, 
the other being similar: v(C) = t. By the LH. C € A. Hence (1.4.7) AFC. 
It follows that A + CV D by 2.5.1 and 2.1.4. Hence C V D € A by Claim 
Seven. 


Conversely, let C V D € A. It must be that at least one of C or D is in A, for if 
not, then =C and —D are in A (Claim Six). Why is this impossible? Because, 
by 2.5.6, AF D; hence Al | by 2.5.7. We have seen already that this cannot 
be. Say then C € A. By LH. v(C) = t; hence v(C V D) = t. 


(vi) Bis CA D. Say o(C A D) = t. Then v(C) = t and v(D) = t so that C 
and D are in A by LH. Then AF CA D by 2.5.1; thus C A D € Aby Claim 
Seven. 


Conversely, let C A D € A. Hence Al C and AF D by 2.5.1. ThusC € A 
and D € A; hence (I.H.) v(C) = t = v(D). 


(vii) Bis C — D. Say u(C — D) = t. There are two similar cases, v(C) = f 
or v(D) = t. We just consider the first: By ILH., C ¢ A; thus -C is in A. 
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By 2.5.1 At ~C Vv D; hence AF C > D by2.4.11. Thus C > Disin A 
(Claim Seven). 


Conversely, let C — D be in A. Thus, Al —C v D, by 2.4.11. By case (v), 
we have —C or D (possibly both) are in A. If the former, then C ¢ A by Claim 
Six; hence v(C’) = f by LH. It follows that v(C — D) = t. The other case is 
as simple. 


(viii) Bis C = D. Say v(C = D) =t. There are two similar cases. 


Case where v(C) = v(D) = t. By IH. C and D are in A. Thus AF C = D 
as per calculation 


C=D 
<> (Leib twice, using the assumptions C, D and redundant true (2.1.23)) 
T=T 


Case where v(C) = v(D) = f. By LH. neither of C and D are in A. Thus 
both ~C and —D are in, and, as before, A + =C = —D. Using 2.4.3 twice 
and 2.4.4 we conclude A | C' = D and are reduced to the previous case. Via 
Claim Seven, both yield that C = D isin A. 


Conversely, let C = D be in A. We argue that it is impossible to have exactly 
one of C and D in A. Indeed, say that C is in and D is not. Thus 7D is in. 
As above, this entails A | C = ~D and—by 2.4.3—A + 7=(C = D). Along 
with the assumption this yields (2.5.7) A+ .L, which we know is impossible. 


Thus, either both C and D are in, where the I.H. furnishes v(C) = t = v(D), 
or neither is in, where the I.H. furnishes v(C) = f = v(D). Both alternatives 
yield o(C = D) =t. 


At the end of all this the reader is entitled to a coffee (no sugar, no milk) break. 
After that, he can easily conclude the proof as follows: By the Main Claim, every 
formula B in A—and hence every formula B inT since F C A—satisfies v(.B) = t. 
On the other hand, as A / A it must be A ¢ A; thus, again via the Main Claim, 
v(A) = f. Therefore /iaxA. This establishes (1). QO 


The proof of the Main Claim had too many inductive cases (corresponding to the 
various cases Of i.p.) because having the best interests of the user in mind (rather 
than those of the metatheoritician), we adopted too many Boolean connectives as 
primitive (just as [17] did, presumably for the same reason). Books and articles in 
logic that write mostly about the metatheory often employ just — and V as primitive 
connectives, which reduces the induction steps above to only two rather than five. 

Post’s theorem is often called the completeness theorem of propositional calculus. 
It shows that the syntactic manipulation apparatus completely captures the notion of 
“truth” (tautologyhood) and “preservation of truth” (tautological implication) in the 
Boolean case. 
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3.2.2 Corollary. If ian B, then + B. 
Proof. Case of f = 9. oO 


3.2.3 Exercise. Prove that the A constructed in the proof of Post’s theorem is infinite 
even if I is finite. 
Hint. Prove that A,, #:Ay+; for all n. Oo 


3.2.4 Exercise. Prove that if [ Frau A, then, for some finite © C T, we also have 
= Fut A. Oo 


3.2.5 Exercise. (Compactness of Sentential Logic) Prove that if every finite subset 
of a set of formulae I is satisfiable (cf. 1.3.11), then so is I. Ey 


3.2.6 Exercise. Prove that a set of formulae I is satisfiable (cf. 1.3.11) iff it is 
consistent (cf. p. 85). O 


3.2.7 Exercise. Fully prove or disprove in the metatheory: 
“For any set of formulae F and any formulae A and B, if + A V B, then it must 
be PF AorP FF B” QO 


3.3. FULL CIRCLE 


Post’s theorem is very convenient. It says that any (correct) schema A1,..., An Feaut 
B leads to a derived rule of inference, Aj,...,A, + B. In particular, combining 
with 2.1.4, we get 


3.3.1 Corollary. If + Aj, fori = 1,...,n, and if Aj,...,An Frau B, then 
CEB. 


This is a very important result. It frees the user of logic to use any tautological 
implication schema as a derived rule of inference in the progress of a proof. That is, 
while the rules Inf1 and Inf2 of 1.4.2 suffice to construct all “truths” starting from 
the “eleven original truths” (1.4.4)—and we already augmented them with all sorts 
of derived rules such as cut, MP, etc.—nevertheless, if convenience dictates, we can 
employ as an additional derived rule of inference any tautological implication schema 
that we happen to know, or happen to invent easily on the spur of the moment. Unless, 
of course, for the higher purpose of learning through hardship(!) we are constrained 
otherwise in an assigned question of a problem set or of a test/exam! 

In sum—unless otherwise requested!—we can, and will from now on, rigorously 
mix syntactic with semantic justifications of our Boolean proof steps. 


We have come full circle. We have started semantically, indicating that what 
matters in Boolean logic is to identify the “true” (tautologies) and “relatively true” 
(tautological implications of given premises) formulae. We indicated that in the 
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present (and foreseeable) state of the art this is an extremely laborious process in 
general. 

To compensate, logicians have long ago discovered a systematic syntactic way 
to exploit educated guessing in such verifications, and therefore often make such 
verifications shorter.’”® Such methodology of educated guessing is what we have 
called proofs. 

The question remained whether such methodology—proofs—fully captures the 
ability of truth tables. With the settling of Post’s theorem (and soundness, 3.1.3) we 
saw that it does. 

Thus the semantic approach—using the truth values t and f and the “operations” 
Fy, Fz, etc., on {f, t}—and the syntactic one (using proofs) are totally equivalent 
and interchangeable. @ 

This interchangeability is subject only to the caveat mentioned in the YY -note 
above. 


3.3.2 Example. Here is an example of a tautology, hence a theorem by 3.2.2, which 
we can easily verify semantically (“easily” means without going into truth tables or 
proofs). 
f ((A—> B)> A)>A (1) 
So let us verify 
Fat ((A > B)— A) A (2) 


How easily? I show that there is no state v that makes the formula in (2) f. Well, if 
some state v does make it f, then A must be f,’? but (A > B) > A must be t. Thus 
A — B must be f. With A being f, this is impossible. 

It is instructive for the reader to attempt a proof of (1) without using semantic 
notions at all. O 


3.4 SINGLE-FORMULA LEIBNIZ 


3.4.1 Example. The following is readily verifiable:®° 
Etat (A = B) > (E[p := A] = Elp := B)) 


Thus, by 3.2.1, 

+ (A= B) — (E[p:= A] = El[p:= B)) (SFL) 
In [17] (SF L)—where it is surprisingly presented as the Leibniz Axiom despite the 
fact that it is provable—plays an active role in a number of applications. O 


The hedging “often” is appropriate at the present state of knowledge, as we have already remarked: 
We do not know whether there is a “fast” (polynomial time) nondeterministic algorithm that recognizes 
tautologies. 

2« A is f” is colloquial for v(A) = f. 

®9L et v be a state where v(A = B) = t. Thus, v(A) = v(B) = t, or v(A) = v(B) = f. Now apply 
the proof of Lemma 3.1.1 to see that v(Z[p := A]) = v(E[p := B]). The demanding reader may 
glimpse at the detailed proof that supplements that of the lemma. 
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Oe Two other ways—without reliance on Post’s Theorem—to prove SFL in the Boolean 
logic as it is founded in this volume, that is, on the axioms 1.4.4 and rules 1.4.2, are 
(1) Applying the deduction theorem (2.6.1) to 


A= Bt (E|p:= A] = Elp := B)) (Leib) 
(2) Using 2.6.2, which says 
D—(A=B)+ D—(Elp:= A] = E[p := B)) 


Indeed, taking “D” to be A = B in the above and noting that (A = B) > 
(A = B), we are done. 

Can one prove SFL in the system presented in [17]? Yes, indeed. 

However, in [17] the deduction theorem is badly compromised by the presence of 
the substitution rule as a primary rule: 


A 


——— _ (Wecall p the eigenvariable) Sub 


Thus, in order to obtain a proof of SFL within the logic of [17], we would rather not 
use the deduction theorem.*! 

Yet—and this is a fact not proved in [17], nor here, but a fact nevertheless for 
Boolean logics that do allow the substitution rule—Post’s theorem holds in the logic 
of [17] in the form of Corollary 3.2.2. 

In fact, any correctly founded Boolean logic will have as (absolute) theorems 
precisely all tautologies. Therefore the reason we gave in the preceding example for 
the theoremhood of SFL is good, not only in our logic, but also in that of [17]. 


Let us derive a few interesting results from SFL. 


3.4.2 Example. By SFL (which is an absolute theorem schema) we have 
+ (p=T)—7>(C=Clp:=T)) 


and 
k(p=l)—>(C =Clp:= 1)) 


where we note that C[p := p} is just C. Using redundant true and the axiom 
7A = A= | (and Leib, of course), the above yield the equivalent formulations 


tp (C =Clp:=T)) (1) 
and 


+ ap — (C =C|[p:= 1)) (2) 


81 For the record, the correct formulation of the deduction theorem in the system of [17] is: “Iff'_+At B 
with a proof that never used an eigenvariable that occurs anywhere in A or the formulae of I’, then 
rra— B” 


ee 
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(1) and (2) yield 
F (C =Cl[p:=T]) V(C =C[p:= 1)) (3) 


via the cut rule (2.5.4). O 


3.4.3 Example. Using SFL we can readily prove (A = B) + C[p:= A] = (A= 
B) > C[p := B]. Indeed, SFL and 2.4.13 imply the above via Eqn. i) 


3.4.4 Example. We next verify (A = B) \C[p:= A] = (A= B)AC[p:= Bl. 


(A= B)AC[p:= A] = (A= B)AC[p:= B] 
° fees and two applications of Leib (at once)) 
~(>(A = B) Vv -C[p := A]) = ~(-(A = B) v -C[p := B)) 
> (2.4.1 and two applications of Leib (at once)) 
«(A = B)v -C[p := A] = -(A = B) V -C[p:= B] 
<> (2.4.11 and two applications of Leib (at once)) 
(A= B) > -C[p := A] = (A= B) > -C[p:= B] 


The last line is the result of the previous example using —C rather than C. i) 


3.4.5 Example. It is instructive to offer a Hilbert-style proof of the above—without 
invoking SFL—as it will introduce a general technique that some people call a 
Ping-Pong argument. Ping-Pong arguments will be especially useful in Part II. This 
technique of proving equivalences is extremely widespread outside the equational 
methodology and is based on the theorem schema below (cf. 2.4.26): 


F (A= B)=(A—> B)A(B- A) 


By equanimity, 
TEA= Bifff' (A— B)A(B- A) (1) 


By 2.5.1, (1) is equivalent to 
I't A= Biff we have both" + A> Band BA (2) 


Thus, to prove + A = B, one equivalently proves the two directions, “(>)” 
(nickname of [ A — B) and “(<-)” (nickname of f+ B — A) 

The technique is almost always used in conjunction with the deduction theorem. 
That is, rather than showing . + A — B, one proves insteadY + At B. 


Let us now re-provel (A = B)AC[p:= A] = (A= B) ACip:= Bl. 
ae 


(1) (A= B)AC|p:= A] (assumption) 


SINGLE-FORMULA LEIBNIZ 


(2) A=B (1) + 2.5.1) 

(3) Clp:= Al] ((1) + 2.5.1) 

(4) Clp:= By ((2, 3) + Eqn/Leib (2.1.16)) 
(5) (A=B)AC[p:= B] (2,4) + 2.5.1) 


(<-) 


Entirely similar to (—+), therefore omitted. 
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3.4.6 Exercise. Give a Ping-Pong argument (Hilbert-style) proof of 3.4.3, without 


invoking SFL. 


QO 


3.4.7 Example. Using redundant true (use the special case T for B) on the above 


two examples (3.4.3 and 3.4.4), we get: 
t+ A—>C[p:= A) = A> C[p:=T] 


and 
+ AAC|p := A] = AAC[p := T] 


If, moreover, we specialize A to p and note that C[p := p] is just C, then we get 


-p>Cz=p-—Clp:=T] 


and 
bk pAC=pACPp:=T] 


(S1) 


O 


3.4.8 Example. (Shannon) Using 3.4.4 with | for B, we get—viat «X =X =1 


and Leib 
t =AAC|p:= A] = 7AAC[p:= 1] 


In particular, if A is p, 
Fk =pAC=-=pAClp:= 1] 
The (S1) of 3.4.7 and (.S2) lead to the following simple calculation: 


pAC|p:=T] V=pAC[p:= 1] 
< (two obvious applications of Leib using (S1) of 3.4.7 and (S2)) 
PACV-pAC 
© (distributivity) 
(pV =p) AC 
© (Leib + excl. middle via 2.1.23; “C-part” is q A C) 
TAC 
© (2.4.20) 


(S2) 


104 THE INTERPLAY BETWEEN SYNTAX AND SEMANTICS 


C 
Thus we have obtained pA C[p:= T] V>pAC|p:= 1] =C (Shannon). O 


In the presence of Post’s theorem and of the deduction theorem, SFL is highly 
redundant as a tool. We will not use it beyond these examples. 


Here is another highly redundant tool that we have already discussed—cf. pp. 42 
and 101—and promised never to use: The substitution rule: 


A 
Ap =o habe when p does not appear in the special axioms (Sub) 


It turns out that it is a derived rule in our logic. 


3.4.9 Exercise. Prove that if + A, then alsot A[p := B] for any Boolean variable 
p and formula B. 
Hint. Either by induction on length of proofs, or using Post’s theorem. O 


3.4.10 Exercise. Prove that if! A, then also . + Alp := Bj for any Boolean 
variable p and formula B—as long as there is a proof of A from I’ where p occurs 
in none of the formulae used from. O 


3.5 APPENDIX: RESOLUTION IN BOOLEAN LOGIC 


Resolution is a simple way to establish the validity of a particular configuration, or 
configuration-schema, of the type [ + A by essentially using just one rule, the cut 
rule (cf. 2.5.4). It is a proof technique introduced by Robinson ([40]) and is based 
on the metatheorem “I+ A iff ! + 4A + L” (cf. 2.6.7). It has been popular with 
automatic theorem provers, that is, computer programs that prove theorems. 

Of course, by the nature of the cut rule, in order to apply it easily on the premises 
I + —A (which, in general, are schemata due to the presence of syntactic variables) 
these must be, or must be brought into, the form 


{C1,C2,...,Cn} 


where each C;—called a clause—is a disjunction of simple formulae, specifically of 
type: atomic, negation of atomic, formula-variable,®” negation of formula-variable. 
We call the formulae in these four categories literals. 

If the premises are not of that form, one will apply a combination of simple 
semantic or syntactic tools to convert them on an as-needed basis. For example, a 
premise such as ‘A — B” would be replaced by the equivalent (cf. 2.4.11) ““=AV B”, 
which is a disjunction. 

One additional feature of such proofs is that they are normally written in a two- 
dimensional manner—as opposed to linear—and since, essentially, there is only 


821. e., syntactic variable, such as A, B,... 
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one rule in resolution, and the circumstances of its applicability are evident, one 
dispenses with detailed annotation and uses as such two lines connecting the two 
premises, A V B and =A V C with the conclusion B V C—that is, like this: 


AvB =AVC 


Noe 


The technique is best illustrated via examples. Recall that the cut rule shown above 
has special cases, namely 2.5.5, 2.5.6, and 2.5.7. In the context of resolution, they 
are all instances of the cut rule. 


3.5.1 Example. Using resolution we prove the most general rule of proof by cases 
(2.5.10), namely: 
A-B,C—+DFAVC>BVD 


Using the deduction theorem, we need to show 
A->B,C—3D,AVCKBVD 


that is, prove | from —A V B,-=C Vv D, AV C,-(B V D), or, in other words (cf. 
2.6.6 and the remark following it), prove that the set of formulae {=A Vv B,=C v 
D, AV C,-7(B V D)} is inconsistent. Here it goes: 


~"AvB, ~CvD, AvC, -~(BvD) 


Bvc 


BvD 
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3.5.2 Example. We next show 
F (A> (B->C)) > ((A> B) > (A C)) 


We do some preprocessing to simplify this question. By deduction theorem, prove 
instead 
A~(B>C)F (A> B)>(A-C) 


By two more applications of the deduction theorem, prove instead 
A->~(B>C),A—>B,AKC 
Therefore we need to show that the set 
{-AV ~-BVC,7AV B, A, 7=C} 


is inconsistent. 


mAv~BvC, -~AvB, A, 7-C 


~AvC 


7A 


3.5.3 Example. Use resolution to prove 


F (AA7=B) > -(A > B) 
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By the deduction theorem prove instead AA —-B + —(A — B). Split hypotheses 
(2.5.2) and move the negation (cf. 2.4.4 and 2.4.11) of the sought conclusion to the 
hypotheses side. We get the hypotheses: 


A,7~B,7=AV B 


Cut Ist and 3rd to get B. Cut this with —B to get 1. This (simple) case did not 
require us to draw any lines. Oo 


3.6 ADDITIONAL EXERCISES 


1. We say that a Boolean formula A is in disjunctive normal form, or DNF, iff it is 
of the form 
D,V D2V...V Dy (DNF) 


where each disjunct D; is a conjunction of the form Cy A Cp A... A Cx, and 
each C;; is a variable or a negated variable, and moreover all the variables of A 
appear in each disjunct, and do so once. 


Correspondingly, we say that a Boolean formula A is in conjunctive normal form, 
or CNF, iff it is of the form 


CyACgN...ACh (CNF) 


where each conjunct C; is a disjunction of the form D; V Dj V ...V Dx, and 
each D, is a variable or a negated variable, and moreover all the variables of A 
appear in each conjunct, and do So once. 


Prove 


e Every formula B is provably equivalent either to . or to a formula A in 
DNF that has the same variables as B. 
Hint. Do induction on the number of variables in B. The induction step can 
be helped by 3.4.8. 


e Every formula B is provably equivalent either to T (hence is a theorem) or 
to a formula A in CNF that has the same variables as B. 


2. (a) Let A be a formula in which the variables p,q,r occur, but no others, and 
whose truth table has a result t only in the rows f,t,f (state for p,q,r in 
that order) and t,f,f. Show that A is provably equivalent to the formula 
a=pANqgA-rVpA7qAnrr. 

Hint. Prove that ay A = 7p Aq Ar V pA 7q A or instead. 


(b) Generalize the above to give an alternative proof for the first bullet in Exer- 
cise 1, 


3. (a) Let B be a formula in which the variables p,q,r occur, but no others, and 
whose truth table has a result f only in the rows f,t,f (state for p,q,r in 
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eo ND WH 


12. 


13. 


14, 


15. 


that order) and t,f,f. Show that B is provably equivalent to the formula 
(pV 7>qVr)A(-=pVqVr). 
Hint. Prove that Fras B = (p V -qg Vr) A (=p V q Vr) instead. 


(b) Generalize the above to give an alternative proof for the second bullet in 
Exercise 1. 


. What is the DNF of p — pV qVr? 

. What is the DNF of -p V p V q? 

. What is the CNF of ap A p? 

. What is the CNFofpAgArAp; — 1? 


. Use resolution (in combination with the deduction theorem)—but not Post’s 


theorem—to provek AV (BAC) — AV B. 


. Use resolution (in combination with the deduction theorem)—but not Post’s 


theorem—to prove (A — B) — (A >C) ~ (A> BAC). 


Use resolution (in combination with the deduction theorem)—but not Post’s 
theorem-—to prove (pVqVr)A(p>p')A(q>p')A(r>p') > p'. 


« Use resolution (in combination with the deduction theorem)—but not Post’s 


theorem—to prove + (p > (q— r)) > (q > (p> 1)). 


Suppose that [' is a set of assumptions, and A, B are two formulae. 

We know that if! AA B, then + A andIt B. 

Is it also true that if AV B,thenD' + Aor[ + B? 

If yes, then give a (meta)proof for any [’, A, B. 

If no, then use soundness (3.1.3) to give a definitive counterexample for appropri- 
ately chosen I’, A, B. 


Give a proof of f A — (B > C) = (A > B) - (A — C) by a Ping-Pong 
argument. 


Prove the following absolute theorem schemata. The use of Post’s theorem is not 
allowed in this exercise. 

eAVA-A 

eA-AVB 

e AVB—BVA 

e (A> B)>(CVA->CVB) 


For all A, B show thatt A — BV A. The use of Post’s theorem is not allowed 
in this exercise. 
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16. Show that forall A and B, we have A — B — A. The use of Post’s theorem 
is not allowed in this exercise. 


17. Use the deduction theorem and resolution (but not Post’s theorem!) to prove 
F (p— (g->1)) > (p> 9) - (P17) 

18. Use the deduction theorem and resolution (but not Post’s theorem!) to prove 

F (pA7q) > ~(p > q) 
19. Use the deduction theorem and resolution (but not Post’s theorem!) to prove 
pogerpyArBorh Aap Bog pi 
20. Use the deduction theorem and resolution (but not Post’s theorem!) to prove 
(-B > 7A) — (~B— A) > B 

21. Use the deduction theorem and resolution (but not Post's theorem!) to prove 

F (AV BV -=C)A(A > B) > (C= B) 


22. Use the deduction theorem and resolution (but not Post’s theorem!) to prove 


-((A7> B)— A) A 
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PREDICATE LOGIC 
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CHAPTER 4 


EXTENDING BOOLEAN LOGIC 


By now we must possess (assuming we did a lot of exercises!) a pretty solid 
technique for proving theorems of Boolean logic. Is this skill (toolbox) sufficient 
toward reasoning in mathematics and computer science? 

I regret to say that it is totally insufficient. You see, computer science and 
mathematics are talking—and contain reasoning and theorems—about objects such 
as sets, strings, numbers, matrices, trees, graphs, programs, models of computation 
(such as “Turing machines”), and many others. 

On the other hand, Boolean logic talks only about the Boolean connectives, and 
how using them we can formulate the truth of extremely general statements, which 
do not express any specific statement that involves any of the previously mentioned 
objects. For example, we cannot formulate the statement, and much less reason about 
its truth: “Every natural number greater than | has a prime factor”. Propositional logic 
does not know what is a “number’—even less so which numbers are “natural”-—what 
is the meaning of “greater”, what is “1”, what is a “prime”, and what is a “factor”. 
The statements that we can write down in Boole’s logic (and then derive conclusions 
about them using logic’s proving tools) are abstractions of mathematical statements, 
that is, statements where all the details about what mathematical objects we are 
talking about—and exactly what we are saying—have been deleted. 
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In particular, we may think of an arbitrary Boolean variable as an abstract mathe- 
matical statement—one whose details we do not know, or ignore and hide, doing so 
in the interest of simplifying reasoning (cf. p. 6). Boolean logic is unable to further 
analyze these atomic statements. 


4.0.1 Example. The two statements of mathematics 
Any two sets y and z are equal if they have exactly the same elements _(1) 


and 
An object x is equal to itself (2) 


have the mathematical formulations®? 
(vy) (Wz) ((We)(a € y=xr€2z) y= z) (1) 


and 
L=2x (2’) 


respectively, in the language of everyday mathematics. 

Statement (1’) is about sets, and it happens to be true in set theory. Ste :ment (2’) 
is a philosophical principle true everywhere, in all of mathematics, not just set theory. 

Yet, if we attempted to formulate (1’) and (2’) in the language of Boolean logic, we 
could just manage to say that both are captured(!) by the same Boolean “statement”: 
p (where, of course, any other Boolean variable will do). Thus, in the Boolean 
formulation (which is a high level of abstraction), we would totally lose the intrinsic— 
and very different—meanings of these two mathematical statements! 

What is the reason? Logic exploits the logical structure of statements (i.e., 
formulae) and through the use of axioms and syntactic proof-writing rules aims to 
verify those statements that are true.®* 

Now propositional logic can only see and reason about the propositional structure 
of statements, i.e., how they are put together via Boolean connectives. 

This logic sees no connectives in (2’) because there are none. It sees no con- 
nectives in (1’) either, because all such are hidden inside the so-called scope of 
“(Wy)(Vz)”, that is, the area between the two big brackets. Boolean logic cannot get 
into this scope since it can neither see nor manipulate its “‘gate-keepers”, the so-called 
quantifiers (Vy)(Vz); moreover it cannot see the mathematical objects y and z that 
these quantifiers refer to. 

What is the result? Boolean logic, in its inability to see any logical structure of 
the type it understands in either of (1’) or (2’), “believes” that it just sees atomic 
formulae in both cases! O 


83x)” is pronounced “for all 2”, thus “Any two (sets) y and z” is mathematized by “for all y, for all 
2”, that is, “(Wy)(Wz)". 

“These aims are not fully realized because not all true statements of set theory or Peano number theory 
can be so verified, as Gédel showed in [16]. 


ee 
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Clearly, in order to do mathematics, we need to expand the language of (Boolean) 
logic so that we can write down statements about objects such as trees, numbers, sets, 
and the like. 

So we need—at a minimum®°—symbols for specific objects (constants) and un- 
specified objects (object variables). We will also need—at a minimum—a way to say 
that two objects are the same.®© 

We should be happy to know that all that we have learned about Boolean logic 
will be useful and readily applicable in predicate logic. There is nothing that we need 
to discard or unlearn. 


4.1 THE FIRST-ORDER LANGUAGE OF PREDICATE LOGIC 


As already remarked, we need to extend the language of Boolean logic, if we are to 
use logic to reason in mathematics and computer science. 

First, we will need an infinite supply of object variables, that is, variables that are 
anything but Boolean. For such we will use the letters x, y, z, u, v, w with or without 
subscripts or primes. For example, x, u,w44, are all acceptable (names of) object 


variables. 


As on p. 94 where we generated the infinite list of Boolean variables using just a finite 
set of subsymbols—p, g, 7,7, 0, 1, 2,3, 4, 5, 6, 7,8, 9—we may do so here for object 
variables, using x, y,z, u,v, w,,0,1,2,3,4,5,6, 7,8,9.°7 But we need not worry 
about this in the main body of Part II (see, however, the Appendix, Section A.4). 
This “do-not-read-me” comment is solely for the benefit of the picky reader. 


By the way, we will drop the qualifier object from now on, but we will continue 
using the qualifier Boolean for Boolean variables. 

When we use logic to reason in computer science or mathematics, we need, 
besides variables, additional objects. For example, when we do number theory—that 
is, the theory of the natural numbers N = {0, 1, 2,3,...}—-we also need symbols 
for constants (e.g., 0), functions (e.g., +, x), and predicates—that is, names of 
relations—(e.g., <). , 

Thus, the first-order language®® of predicate logic will build on that of Boolean 
logic by adding to the Boolean alphabet (cf. p. 9) the following: 


(1) Symbols for object variables (x, y, 2/5, ...)- 


7 


(2) A symbol for equality between non-Boolean objects. We use “=”. 


(3) A symbol for “for all”—the quantifier “V’’. 


85“At a minimum” because we need also axioms and rules so we can verify by syntactic means the “truth” 
of what we write down. 

86For objects of Boolean type, that is, formulae, “=” does that for us. 

87In this approach, the object variables will be the members of the regular set {z, y, z, u,v, w}{s}*({e}U 
{1,2, 3,4, 5, 6, 7, 8, 9}{0, 1, 2, 3, 4, 5,6, 7,8, 9}*). 

88 Why “first-order”? I will explain soon. 
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(4) Symbols for constants (non-Boolean!). 
(5) Symbols for functions. 
(6) Symbols for predicates. 


The qualifier “symbol(s) for” will be henceforth understood and therefore omitted. 

Items (1)-(3) are mandatory in the alphabet regardless of where we plan to apply 
our predicate logic. Thus, since they are independent of application, we call them 
logical symbols, just as we so call all the symbols we inherited from the Boolean 
alphabet. 

However, what precise symbols of types (4)-(6) we employ in a first-order lan- 
guage depends on the branch of mathematics, or computer science, where we want 
to apply predicate logic (as a proving tool), and for that reason these are called 
nonlogical symbols. 


4.1.1 Example. (Some Examples of Nonlogical Symbols) To do number theory we 
employ one constant, 0, three functions, +, x and S—where S is the name of the 
“4-1 function” (the so-called successor)—and one predicate, <. 


Pause. But what about other constants (1, 2, 11, 100056) and other functions 
(e.g., exponentiation) and other predicates (e.g., <)? 


All these can be introduced in terms of the given primitive symbols using defini- 
tions. It is not in our scope to say how this is done—some such definitions are quite 
tricky, e.g., the one for exponentiation—but here are some easy ones: “1” is defined 
as SO, “2” as SSO, “5” as SSSSSO, where we wrote “Sz” for “S(x)”. Recalling 
that S is intended to signify—hence intended to behave like—the successor function, 
itis clear that these definitions are sensible. After all, (0 + 1) + 1 = 2, etc. 

Also, < is easily introduced by the definition “x < y abbreviatesz = yVz < y”. 

To do set theory one need employ no constants and no functions, but just one 
predicate, €. All other familiar (and all the unfamiliar) symbols of set theory are 
built via definitions, using just €. For example, constants such as 9, predicates such 
as C, and functions such as U are all built in terms of €. O 


The intended behavior of nonlogical symbols is enforced in each application of 
predicate calculus—such as number theory—by special axioms. These behavior- 
enforcing special axioms are naturally called special axioms, but also nonlogical 
axioms. By their nature they are not universally applicable to all of predicate logic; 
rather they are specific to one application. 

I should mention that what I call an application here—or applied first-order logic 
if you will—we nowadays normally call a theory. That is, a theory is a toolbox 
using which we can, in principle, generate all the theorems that together describe the 
behavior and properties of selected mathematical objects (for example, the sets and 
the relation €). This toolbox consists of: 


(i) A first-order language that has a hand-picked, specific, set of nonlogical symbols 
that is appropriate for the intended application (e.g., for set theory we just 
include the predicate €; nothing else). 
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(ii) Special axioms that give the basic properties of the nonlogical symbols. Intu- 
itively, these state selected fundamental “relative truths” that characterize the 
theory. 


(iii) The logical axioms that are common to all theories. Intuitively, these state 
“absolute truths” that are valid in all theories. 


(iv) Rules of inference (see the next section) 


With tools (i)—(iv) we can generate theorems, as will be discussed in the next section. 


The purpose of Part II of this volume is to thoroughly acquaint the reader with 
what predicate calculus is, and to equip the reader with a solid technique in using 
this calculus in any application. For this reason, when we teach the calculus, and 
train the reader in its use, we are obliged to formulate it in terms of unspecified 
nonlogical symbols, so that we can talk about all possible first-order languages in a 
unified manner. 

Thus, we will denote—in the general description of the alphabet below—the 
constants by generic names a, b,c, the functions by generic names f,g,h, and the 
predicates by generic names ¢, w. 


These notational conventions will not stop us from giving examples from time to time, 
where symbols from a specific first-order language are used, e.g., from the language 
of number theory. 

In summary, we have: 


4.1.2 Definition. (The Alphabet of the General First-Order Language) A first- 
order alphabet consists of a logical part (logical symbols) and a nonlogical part 
(nonlogical symbols). 

The fixed part for all first-order alphabets is the set of logical symbols, which is 
an extension of the Boolean alphabet (p. 9). All such alphabets include precisely 
L1-L7: 


L1. Boolean variables.®° These are p,q,r, with or without primes or subscripts— 
€.8., D's 913,78): 
We also have a supply of Boolean metavariables, p,q, r,q%@,... precisely as 
on p. 9 and for the reasons already explained there. 


L2. Object variables—or just “variables”. These are x, y, z, u,v, w, with or without 


primes or subscripts—e.g., 2, u, x, wi3, vei. 


As in the case for Boolean variables, we will often need to write down expressions 
such as “if A, then} (Vz) A, for any x” (cf. 6.1.3 later on). 


This immediately creates a difficulty: What do we mean by “any z’’? There 
is only one specific z. Thus we employ object metavariables to name or point 
to arbitrary object variables. The symbols for those are chosen analogously 


®9Recall that we promised not to say “symbol for”. Of course, everything in the alphabet is just a symbol. 
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with those for the Boolean case: x,y,z, u,v, w, with or without primes or 
subscripts—e.g., x, u, x’, Wis, Ve4- 


Now it is all right to say “if A, then + (Vx) A, for any x” or indeed just “if 
+ A, then (Vx)A” since x points to an arbitrary object variable, thus rendering 
the part “for any x” redundant. Remember: “for any x” refers to the variety of 
what x names, not to a variety of metavariables! 


L3. Two symbols for Boolean constants, namely T and L. 


L4. Brackets, namely, (and ). 


L5. Boolean connectives, namely, the symbols listed below, separated by commas” 


Ty A, Vy, = (7) 


L6. The equality of objects symbol, “=”. 
L7. The universal quantifier symbol, “V’’. 


The variable part for all first-order alphabets is the set ofnonlogical symbols, aset that 
is application-specific. We have a different alphabet for each different application. 

We can talk about any first-order language without being pinned down.to any 
specific application by using generic symbols. Thus a first-order alphabet must also 
contain: 


NL1. Zero or more object constants,?' which in an application-independent fashion 
are denoted generically by a, b,c, i.e., lowercase Latin letters from the begin- 
ning of the alphabet, with or without primes or subscripts; e.g., a”, big, Céi9. 


NL2. Zero or more functions, denoted generically by f, 9, h—just these lower case 
Latin letters!— with or without primes or subscripts; e.g., f’, 91009; 2{o90519- 
Each function has an arity, and this strange word means “the number of 
arguments that the function can take”. This is significant, see definition of 
terms below. 


NL3. Zero or more predicates, denoted generically by the letters ¢, /—just these 
two lower case Greek letters!—with or without primes or subscripts; e.g., 
¢' 61193409 ¥1s09- Each predicate too has an arity, and this also is significant; 
see definition of atomic formulae below. D 


4.1.3 Remark. (1) The term arity was made up by mathematicians and logicians. It 
is derived from words such as “binary”, “unary”, “ternary”. A temary function has 
three arguments: Its arity is 3. 

(2) How does one know the arity of, say, f{¢? Well, this is a silly question; “f{¢” 


is one of (infinitely) many generic symbols that we use to denote functions when we 


The commas are not part of the alphabet. 
91S ymbols for constants. 
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do not want to work in a specific theory and its language. Just as in algebra we say 
“Let f be a function of ten variables ...” and thus introduce into the discussion the 
symbol f, accordingly, when we are training in predicate logic we can say similar 
things: “Let f{% be a function of arity 9...” Thus it is not an issue of knowing—fi2 
is what we say it is; it has no fixed status. In a different application-independent 
discussion we may find ourselves saying “Let f{? be a function of arity 99 ...”, and 
this is perfectly fine. 

On the other hand, if one speaks a specific language, say, that of set theory, then 
the symbol € is the same throughout set theory. We cannot say today “Let € have 
arity 3” and say tomorrow “Let € have arity 1”. In fact, its arity is fixed once and for 
all at the time the alphabet for the language of set theory is given (in this case it is 2). 

(3) We were tempted to use the generic names P,Q, R for predicates, but that 
clashes with the naming of formulae (see !.1.2). Hence we suggested ¢ and w. 

You might ask: So, what is wrong with using P,Q, R and living with the clash? 
Is not a predicate a formula, after all? 

No, not really. For example, ““<” is a predicate, and it is clear (on intuitive grounds, 
even before we give Definition 4.1.13 below) that it is not a formula. On the other 
hand, “2 < 3” is a formula, but it is not a predicate. 

A configuration consisting of a predicate acting on arguments is a formula (a very 
simple one at that). a) 


We have talked about alphabets and languages. We have defined alphabet. So, what 
is a language? Well, the language consists of all those “important” strings that we 
can build using the symbols of our alphabet. In Boolean logic, the set of important 
strings—the language—is the set of all formulae (WFF). In predicate logic, we 
have a richer language. Not only can we use the alphabet to write down statements 
(formulae), but we can also use it to write down objects (terms). 

We will need to define the syntax of terms first, as it will become obvious. 

Intuitively, the simplest objects are the variables and the constants. We can build 
more complicated objects by applying functions to objects that we already have. For 
example, in the language of number theory, z and y are simple objects; x + y is a bit 
more complicated. 

This is the idea behind the definition below, which is formulated as a “calculation” 
in the style of 1.1.3. 


4.1.4 Definition. (Term-Calculation or Term-Parse) 
A term-calculation (or term-parse) is any finite (ordered) sequence of strings that 
we may write respecting the following two requirements: 


(1) At any step we may write any symbol from L2 or NL1 of the alphabet (4.1.2). 


(2) At any step, if f is a function of arity n and we already have written down the 
strings (without the quotes, of course) “t,”, “te”, ..., “tn”, then we may write 
the string “ft,t....t,”. 0 


Imitating 1.1.5, we next define terms: 
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4.1.5 Definition. (Terms) A string ¢ over the alphabet of 4.1.2 will be called a term 
iff it is a string written at some step of some term-calculation. 
The set of terms we will denote by Term. Oo 


4.1.6 Remark. (1) We will use the generic symbols ¢ and s, with or without subscripts 
or primes, to denote terms. Thus, tgg denotes some term. Therefore, these names are 
syntactic variables (metavariables) for terms. 

(2) In practice, we use a more friendly notation than “ft,tz...t,”. We write 
instead “f(t1,t2,...,tn)”. Note that the comma “,” is not in our alphabet, but in the 
metalanguage we have infinite leeway when it comes to serving user-friendliness. 

Similar conventions apply to specific languages. For example, in the language 
of number theory the correct notation is “-+zy”. However, one sacrifices absolute 
syntactic correctness in the interest of user-friendliness and writes “x + y” instead. 
These conventions are in the same spirit as the conventions regarding the elimination 
of “redundant brackets” and are made in the metatheory. 

(3) Analogous remarks to those in 1.1.6 apply here; thus, on one hand, one can do 
induction on the complexity of terms to metaprove properties of terms. We define the 
complexity of a term to be the number of function symbols—counting each repetition 
as a new occurrence—appearing in it. This is a natural measure as the complexity 
increases with every step such as (2) of Definition 4.1.4. Thus, x has complexity 
0, fax (assuming the arity of f is 1) has complexity 1, f f fz has complexity 3, and 
oy f f f fa°*—where g has arity 2—has complexity 5. 

On the other hand, an inductive (recursive) definition of terms is possible, as 
follows. a) 


4.1.7 Definition. (Alternative (Recursive) Definition of Terms) The set of terms is 
the smallest set of strings, Term, that satisfies 


(1) All variables and constants are in Term. 


(2) If f is an n-ary (of arity n) function and ¢),t2,...,t, are in Term, then so is 
Pilate: oO 


We can now define the simplest possible formulae, the ones we can write down 
without using any Boolean connective. 


4.1.8 Definition. (Atomic Formulae) The following are the atomic formulae of 
predicate calculus: 


(a) Any Boolean variable and any Boolean constant. 
(b) The string ¢ = s for any terms ¢ and s (possibly, ¢ and s name the same term).”? 


(c) For any predicate ¢ of arity n, and any ntermst,,to,...,t, the string dt, t2... tn. 


oy f f f fx in friendly (1) notation is written as aly, FFF (2))))). 
Let us recall that t and s are metavariables for terms. 
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We denote the set of all atomic formulae by AF. i) 


4.1.9 Remark. (1) As in the case of terms, we opt for the friendly informal notation 
with brackets and commas: Rather than the correct “@t,t2...t,” we abuse syntax 
and write “@(t),t2,-.-,tn)”. 

(2) “=” is a very special logical (the only logical one!) 2-ary (or binary) predicate, 
that of equality. Our syntactic rule uses a so-called infix notation for the associated 
formula; “t = s” rather than “= ts”. 

Some texts use “=~” (e.g., [13]) instead of “=” in order to avoid confusion with 
the informal (metamathematical) ‘“‘=". We will not do that; rather we will allow the 
context to fend for itself. Note that [17] uses “=” for at least three different roles: 
informal equality, formal equality, and as a conjunctional alias of “=” in equational 
proofs. , 

We overload the “=” symbol a bit less by letting “<>” perform the last role. We 
will never let “=” stand for “=”. O 


We can finally define a// formulae of predicate calculus exactly as we did with the 
Boolean case, starting with the concept of formula-calculation. 


4.1.10 Definition. (Formula-Calculation or Formula-Parse; First-Order Case) 
A formula-calculation (or formula-parse) is any finite (ordered) sequence of strings 
that we may write respecting the following four requirements: 


(1) At any step we may write any atomic formula (member of AF defined in 4.1.8). 


(2) At any step we may write the string (~A), provided we have already written the 
string A. 


(3) At any step we may write any of the strings (A A B), (AV B), (A — B), 
(A = B), provided we have already written the strings A and B. 


(4) Atany step—and for any choice of variable x—we may write the string ((Vx)A), 
provided we have already written the string A. 


“VY” is called the universal quantifier and is pronounced “for all”. The string 
“(Wx)” we pronounce “for all x”. We say that the subformula A in ((Vx)A) is 
the scope “(Vx)”. D 


4.1.11 Remark. (a) Case (4) in 4.1.10 is new. It did not occur in Definition 1.1.3. 
We say that x in ((Vx) A) is a quantified variable (we also call it bound, but more on 
this shortly). In a first-order language, we are allowed to quantify only first-order 
variables, as we call the object variables. We are not allowed to quantify second- 
(or higher-) order variables such as names of predicates or functions. For example, 
we have no way, in a first-order language, to write down a formula that says “for all 
functions f ...”. 

(b) Since cases (2)-(3) in the above definition are the same as in 1.1.3 and since, 
moreover, AF includes afl Bootean variables and constants, it follows that every 
Jformula-calculation in the sense of 1.1.3 is also valid according to 4.1.10. O 


ee 
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The picky reader will probably ask: “You said ‘{Flor example, we have no way, in 
a first-order language, to write down a formula that says “for all functions f ...”.’ 
But surely in set theory, which has been repeatedly mentioned as an ‘applied case’ of 
first-order logic, we must be able to say things like ‘for all functions f ...’?” Well, 
yes, we can. What I meant is that a first-order language does not have the ability to 
say “(Vf)”, where f is a function symbol. 

However, in set theory, functions are not symbols of the alphabet, but are defined, 
as we say extensionally,”4 i.e., as sets of ordered pairs, pairs themselves being defined 
(implemented) as certain sets.° 

Now, axiomatic set theory has normally just one type of variable, “set”. We can 
certainly say “(Vx)” if x is an object (set) variable. Since a function, extensionally, 
is a set, I can say “(for all functions)(. . .)” by saying instead “(Vr) (x is a function > 


a 


4.1.12 Example. In the first step of any formula-calculation, only requirement (1) 
of Definition 4.1.10 is applicable, since the other three require the existence of prior 
steps. Thus in the first step we may write only an atomic formula. In all other steps, 
all the requirements (1)-(4) are applicable. 

Here is a calculation (the comma is not part of the calculation; it just separates 
strings written in various steps): , 


P, ae (-T),@ 


Verify that the above obeys Definition 4.1.10. 
Here is a more interesting one: 


P.4, (pV q),(PAQq), ((pV 9) = 49), ((pAQq) =P); (((pva) =4q) =((pAq) =p)) 


Both previous calculations are also calculations in the sense of 1.1.3 and were 
written down in an earlier example. Here are a few that are not, but which are valid 
calculations as far as 4.1.10 is concerned: 


p; ((Vz)p) 
x =a, ((Vxz)z = a),p, (((Vz)x = a) Ap),u=v,(u=v—- (((Vr)z = a) Ap)) 
Recall that“‘a” is a constant. 
z= y,(-2 = y), ((Vz)(-2 = y)) 


z= y,((Vz)z = y), ((Vx)((Vz)z = y)) 


That is, by what they include as members—not behaviorally or “intentionally”. 
*SNormally via Kuratowski’s definition, by which the ordered pair “(x,y)” is an abbreviation of the set 


{{z}, {x, y}}. 


2? 
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Particularly important are the first—p, ((Vz)p)—and the last two calculations: The 
first two of these exemplify that we do not care whether a variable z occurs in “A” 
before we form “(Vz)A”. The “A” here were p and (=x = y) respectively. 

The last one illustrates that it is allowed to place one (Vx) immediately in front of 
another. 

Of course, why these actions are legal is implicit in (4) of 4.1.10: Absolutely no 
restrictions are placed on either x or A (other than A must be already written). 0 


4.1.13 Definition. (First-Order Formulae) A string A over the alphabet 4.1.2 will 
be called a first-order formula or a well-formed-formula iff it is a string written at 
some step of some formula-calculation conducted as per 4.1.10. 

The set of first-order formulae we will denote by WFF. A member of WFF is 
often called a ““wff” (or just a formula). O 


My apologies to the reader for using “WFF” and “wff’ both for the first-order 
formulae of Part 11 and for the Boolean expressions of Part I. 

In my defense, ‘“WFF” from now on will be in the former (Part Il) sense exclu- 
sively, and, in any case, any Boolean expression is also a wff in the new (4.1.13) 
sense as we remarked above (cf. 4.1.11(b)). 


As in the cases of Boolean formulae, and members of Term, a string is a wff 
(member of the set WFF defined 4.1.13) iff this can be certified by showing that the 
String is put together using certain strings that we already know are in WFF. And 
again, aS was done twice before, this leads to a recursive (inductive) definition of 
WFF: 


4.1.14 Definition. (Recursive Definition of WEF) WFF is defined as the smallest 
set of strings that contains all the members of AF and moreover satisfies: 

(1) If A is in WFF, then so are (=A) and ((Vx) A). 

(2) If A and B are in WFF, then so are (A A B), (AV B), (A — B), and 
(A = B). Oo 


Clearly, the complexity of wff increases every time we perform a step (2), (3), or (4), 
in a formula-calculation (4.1.10). Thus we define 


4.1.15 Definition. (Complexity of Members of WFF) The complexity of a wff is 
the total number of occurrences of V, =, A, V, +, = in the formula, counting each 
repetition as a new occurrence. O 


4.1.16 Example. x = y and p have complexity 0 each. ((Vx)((Vy)(-2 = y))) 
has complexity 3. ((Vy)((~z = y) A p)) and (((Vy)(-2 = y)) A p) each have 
complexity 3. 

But as an aside, note that in the first of the last two examples p is in the scope 
of (Vy), whereas in the second it is not. Oo 
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4.1.17 Definition. (The Existential Quantifier) We introduce a new symbol in the 
metatheory—an abbreviation, that is, not a formal symbol—J, called the existential 
quantifier, pronounced “there exists” or “for some”: 

For any formula A, the string ((Ax)A)—a string of the metatheory this is; not a 
string of WFF—abbreviates the formal string (member of WFF) (=(Vx)(—=A)). O 


4.1.18 Remark. (For the reader who will consult [17]: Notation Translation) 
Where we write ((x)A) and ((4x) A), [17] uses 


(Vx |: A) and (Ax| : A) respectively (1) 


If you are wondering, “What on earth do both “|” and “:” do, one next to the other?” 
the reason is that (1) is a special case of 


(Vx|B : A) and (Sx|B : A) 


which in the standard notation of the computer science and mathematical literature— 
which we follow—are written as 


((Vx)(B — A)) and ((4x)(B A A)) 


respectively. 

An expression like ((Vx)A) we pronounce “for all x, A holds”. An expression 
like ((43x) A) we pronounce “for some x, A holds”, but also “there exists an x, such 
that A holds”. 

In an expression such as ((Vx) A), x stands for a specific variable among 2, y, 2499, 
etc., where we either do not know which, or do not care. Thus, intuitively, it says 
“for all values of x ...”, not “for all variables x, 2”, ye7 ...”. Similarly, by “for 
some x, A holds” we understand “for some value of x, A holds”. 

This pronunciation, as well as the clarification regarding “value”, are consistent 
with the intended meaning of quantifiers. This intended meaning not only tells 
us what is the right way to pronounce ((Vx)A) but will also guide us to choose 
appropriate logical axioms (next section) that capture this intended meaning purely 
syntactically. 

I emphasize that our main task in predicate calculus will be to calculate theorems, 
that is, to write proofs. Semantic ideas—with the exception of those borrowed from 
Boolean logic—will have absolutely no role in the writing of our proofs. 

However, keeping our intuition informed and active, in particular understanding 
what an expression like ((Vx) A) is meant to say, will often assist our imagination 
toward discovering proofs. Anything goes in the discovery stage, even consulting 
the Oracle at Delphi.®° The actual writing stage is, however, restricted to be syntactic 
(except as noted in the previous paragraph). O 


%How we guess the next step is our business. However, we write and document a proof according to 
rules. 
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Before we go on, and in the interest of making notation friendlier (and sloppier), 
we augment Remark 1.1.11 here to take care of the additional symbols of first-order 
languages: 


4.1.19 Remark. (Priorities and Bracket Reduction) Our previous agreement (in 
Remark 1.1.11) on how to be sloppy and get away with it remains essentially the 
same, now augmented to take care of V as well: 


Certain brackets are redundant—and hence can be removed—from a formula 
written according to Definitions 4.1.13 and 4.1.14, still allowing it to say the same 
thing as before: 


(1) Outermost brackets are redundant. 


As in 1.1.11, for the next two cases it is easiest to think of the process in reverse: 
how to reinsert correctly (as per Definition 4.1.13) any omitted brackets. 


(2) Any other pair of brackets is redundant, if its presence (as dictated by 4.1.13) 
can be understood from the priority, or precedence, of the connectives. Higher- 
priority connectives bind before lower-priority ones. Thatis, if we have a situation 
where a subformula A of a formula has already been reconstructed as per 4.1.13, 
and is claimed by two distinct connectives o and ©, among those in (*) below, as 
in “...o Ao...”, then the higher-priority connective “glues” first. This means 
that the implied brackets are (reinserted as) “...0 A)o...” or “...0({Ao...” 
according as o or ¢ has the higher priority respectively. 


The order of priorities (decreasing from left to right, but (Vx) and — having equal 


priority) is agreed to be: 
{} ave (x) 


(3) Ina situation like “... 0 A o...”—-where A has already been reconstructed as 
per 4.1.13, and © is any connective listed in (x) above, other than — or (Vx)— 
the right o acts before the left. Thus the implied bracketing is “...0(Ao...”. 


Similarly, -—A is short for (=A), -(Vx)A is short for -((Vx) A), (Vx)=A is 
short for (Vx)(—A), and (Vx)(Vy)A is short for (Vx)((Vy).A). 

We say that all connectives are right associative. This applies to (4x) as well 
when this abbreviation is used; after all, the convention of this remark applies to 
metatheoretical notation. 


It is important to emphasize: 


(a) This “agreement” results in a shorthand notation. Most of the strings depicted by 
this notation are nor correctly written formulae, but this is fine: Our agreement 
allows us to decipher the shorthand and uniquely recover the correctly written 
formula we had in mind. 


(b) The agreement on removing brackets is a syntactic agreement. 
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In particular, right associativity says simply that, e.g., p V q V r is shorthand for 
(pV (qV r)) rather than ((pV q) Vr). | 


4.1.20 Example. Here are some examples of simplified notation: 
(1) Instead of (u = v > (((Vx)x = a) A p)) we may write simply 


u=v— > (Vr)zr=aNp 


In the simplified notation the inexperienced reader may have some trouble readily 
seeing that p is not in the scope of (Vz). The rule of thumb is 


Whenever in doubt, use extra brackets! 
(2) Instead of ((Vz)(—a2 = y)) we write simply 


(Vz)-@ =y 
(3) Instead of ((Vx)((V¥z)x = y)) we may write 
(Va) (Vr)x = y 


Note that we do not want to eliminate the brackets around a quantifier, treating 
“(Wx)”—and the defined “(Ax)"—as compound symbols. Oo 


If “(‘Vx) A” is meant to say that “for all values of x, A holds”, then this is analogous 


to 
4 
is (1) 
i=1 
which says, “For all integer values of i—from 1 to 4 inclusive—compute the result 
i? and then add all these four results.” That is, compute 
4274374 4? 


Note that in expression (1) we are not allowed to substitute values into 7. That is, 


something like 
4 
>? ; (2) 
3=1 


is totally meaningless as it would say, “For all integer values of 3—from | to 4 
inclusive—compute the result 32 and then add all these four results.” Nonsense!— 
“3” cannot obtain any values other than 3. 

Thus, i in (1) is unavailable for substitution. We say that it is a bound variable. 
On the other hand, the expression 


4 
Soi+2)? (3) 
i=1 


means 
(1+ a)? + (2+ 2)? 4 (342)? 4 (442)? 
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Thus we may, if we wish, substitute specific values into x, say, 9, and compute 
(149)? + (2+ 9)? + (3+9)?4 (449)? 


for short : 


S49) 
i=1 
Thus «x in (3) is available for substitution, or as we say, free. 
Analogously, x is bound in the expression below, while z is free: 


(Va)x =z 
We capture this by a definition. 


4.1.21 Definition. (Bound and Free Occurrences) An occurrence of a variable x 
in a formula A is characterized by the position—from left to right: first, second, 
etc.—where x occurs as a substring in A (cf. 1.3.6). For example, the 3rd occurrence 
of x in the following formula is shown boxed: 


r=yror=yV(Uape=z 


We say that an occurrence of x in A is bound iff the occurrence is the x that occurs 
in a substring (Vx) of A, or if it occurs in the scope of some (Vx) that occurs in A. 

An occurrence of x in A that is not bound is called free. Below I show all bound 
occurrences of x in the foregoing example (singly boxed): 


@-»-E]-@e-: 


We usually count bound occurrences—and free occurrences—from left to right, 
separately, thus, we have above two bound occurrences, shown boxed, and two free 
occurrences, shown doubly boxed. We also have two free occurrences of y and one 
of z. QD 


4.1.22 Remark. We sometimes say, “The nth bound occurrence of x belongs to—or 
is bound to—the mth occurrence of (Vx) in the formula A.” What do we mean by 
that? 

We mean one of the following: 

(1) The nth bound occurrence of x is the x in the mth occurrence of (Vx), 
or 

(2) The scope of the mth occurrence of (Vx) is the shortest among the scopes, of 
any (Vx), which contain the nth bound occurrence of x. 


Thus, in 
(=) (=v (EN El]= ==») 


the three boxed bound «x belong to the first (leftmost) (Vr) while the doubly boxed 
ones belong to the second one. 


Se 
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Similarly, in 
(vx)(Wa)] = y (1) 


the boxed x belongs to the second (rightmost) (Vz). 

This process of finding where a bound variable x belongs—to which (Vx)—hinges 
on formula-calculations: An x belongs to that (Vx) that took away its freedom in the 
course of a formula-calculation. 

For example, a formula-calculation for (1)}—with reduced bracket notation—is 


r=y,(Vr)c = y, (Vr)(Vr)z = y im 


This process of finding to which “(Vx)” a bound variable x belongs is analogous to 
that of finding the “block-head” where a local variable x is declared in a language 
like Algol or Pascal: One goes back (leftward in the program string) until one reaches 
the first occurrence of a declaration for x. 


4.1.23 Definition. (Subformulae) We define the concept “B is a subformula of A” 
inductively: 


(1) A is atomic. Means that the strings A and B are identical. 
(2) Ais —C. Means that either B is the same string as A, or it is a subformula of C. 


(3) Ais (¥x)C. Means that either B is the same string as A, or it is a subformula of 
C. 


(4) Ais Co D, where o € {A,V,—>,=}. Means that either B is the same string as 
A, or it is a subformula of C, or of D (or both). O 


4.1.24 Exercise. Show by induction on A, that if we replace any (possibly all) 
occurrences of a subformula of A by some Boolean variable, then the resulting string 
is a formula. O 


4.1.25 Remark. (Abstractions) We saw that every Boolean expression as defined 
in 1.1.5 is also a first-order wff, as defined in 4.1.13 (cf. 4.1.11). The former is a 
special case of the latter. 

Much more useful to us, it turns out, is the fact that Boolean expressions are 
abstractions of first-order formulae. We already said so in the preamble of the 
current chapter (p. 113), but we will now make the case; the why and how. 

First the how: To “abstract” means to discard information that you do not need 
so that you can focus on what really matters unhindered by unnecessary details. We 
can view (first-order) formulae as Boolean formulae if we become oblivious to the 
presence of all non-Boolean elements—those that speak of objects—and concentrate 
instead on the Boolean structure, that is, how the Boolean connectives 4, A, V,—, = 
connect things up. The non-Boolean elements are, of course: 


_” 


The (object) variables, constants, predicates, functions, “=”, and “Vv” (1) 


oe 
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How do we become oblivious to such elements of syntax—as enumerated in 
(1)--that occur in a first-order formula A? 

We implement our unconcern by covering those elements up! That is, by first 
identifying the shortest possible subformulae of A that contain such symbols and then 
replacing said subformulae by new?’ Boolean variables. 


Important! In the actual implementation of this procedure we do not really 
replace these subformulae by new symbols. They themselves, as written, are 
(names of) these new Boolean variables, a fact that automatically satisfies the 
remark in the preceding footnote. 

Just like the existing Boolean variables of the alphabet, such as p'r?7999, 
these too are compound symbols-—such as “x = y” or “(VWx)x = z”—the 
elementary subsymbols of which, e.g., “(,V,x”, are invisible to Boolean logic 
as separate syntactic entities, just like the individual subsymbols “7,9,'” of 
Prrzg9q are invisible as symbols of any structural significance vis a vis 1.1.3. 


Exercise 4.1.24 guarantees that these substitutions of subformulae by Boolean 
variables generate formulae. Of course, viewing a first-order subformula as a new 
Boolean variable, in effect, substitutes the subformula by the said variable. 


In essence we are saying: 


I do not care what these subformulae of A say about objects. I am only 
interested in how these subformulae interconnect via —=,,\V,— and =, that 
is, in the Boolean structure of A. 


For example (using simplified notation), if A is 
p—x=yV (Vx)$x Aq (Note that q is not in the scope of (Vx)) 


then the abstraction is 

p> pvp" Aq 
where I used metavariables, in order to emphasize the form of the “abstraction”, p’ 
for “a = y” and p” for “(Vxr) oz”. 

Needless to say, I view p’ and p” as distinct, and different from any Boolean 
variables of alphabet 4.1.2—the former because, by inspection, the actual names 
“g = y” and “(Vr)dzx” these metavariables stand for are distinct strings; the latter 
by the “Important” italicized remark above. This obvious comment does not deserve 
repetition, 

If Ais 


Z=yrr=yVz=v (1) 
then the abstraction is 
p—-pVvq 
again using the metavariables only for notational emphasis that indicates our indif- 
ference to the exact first-order structure of x = y and z = v. 


97*New" means that they do not already occur in A as members of first-order alphabet 4.1.2. 
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Note that the abstraction of (1) is a tautology. 

A more interesting example is the following, which shows that some of the shortest 
subformulae that contain non-Boolean elements will be totally suppressed in the final 
abstraction: 

(Vaz)(z = y > (Vz)z =aVq) 


is just a (new) Boolean variable, p, since the shortest subformulae that contain 
non-Boolean elements have been identified (via enclosing boxes) as follows: 


w= @=a)— [Wale =al]v a] (2) 


so that the abstraction process, in slow motion, and working from inside out is: First 
obtain 


(ve)(q +[(vz)q" |v a) | (3) 


then 


(vx)(q’ > q” Vq) (4) 


and finally take care of this last box and call it, say, “‘p”. 
Similarly we handle an A that is 


p—x=yV (Vzr)(dx A q)—q in the scope of (Vz) (5) 


Then the abstraction is marked as 


[Ean [Wales] a 


and in the final stage yields 
p—>pvp" 
It should be clear (cf. 4.1.26) that the subformulae of the following two types, 
(a) atomic—but not Boolean or (b) of the form ((x)A), are precisely the 
ones that get abstracted into (i.e., name) new Boolean variables. Not all such 
identified subformulae of some formula A need appear in the final result of the 
abstraction of A (cf. (3) and (4) above; q',q'',q" are lost). Subformulae of 
the types (a) and (b) are called prime (e.g., [45, 53]). 


But why do we want to abstract? Abstractions are extremely useful in predicate 
logic. On one hand, we have the obvious benefit, simplification. On the other, by 
allowing us to view first-order formulae as Boolean formulae, abstractions enable 
us to use, in predicate logic, all the semantic (truth table) and syntactic (proof) 
techniques that we have learned in Part [—including 3.2.1. 


To be sure, additional techniques that allow us to handle ““/” and “ ="— 
which abstraction totally hides from view—will be necessary, so we must not 
expect to reduce predicate calculus totally to propositional calculus! 


Thus, in the context of first-order logic, we have a new view of Boolean variables: 
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A Boolean variable denotes a statement about objects, but we either do not 
know what it says, or we do not care what it says about these objects. 


With only a modicum of practice, we will find that we do not have to rewrite a 
formula, introducing new Boolean metavariables, in order to view it abstractly. For 
example, we should be able to see at once the (final) boxed stage that identifies the 


Boolean structure of (5): 
po[zr= ahd (Var)(da A q) 


By the way, boxing the shortest subformulae in our process of abstraction is im- 
portant as it maximizes the number of Boolean connectives that remain “uncovered”. 
We suppress details about objects but keep ail the detail of the Boolean structure. O 


4.1.26 Exercise. Verify the statement I made earlier, in 4.1.25, namely, “It should be 
clear that the subformulae of the following two types, (a) atomic—but not Boolean 
or (b) of the form ((‘’x) A), are precisely the ones that get abstracted into (i.e., name) 
new Boolean variables.” 

Namely, show that any non-Boolean symbol of a formula belongs to a shortest 
subformula of the type (a) or (b). a) 


We may now define: 


4.1.27 Definition. (Tautologies and Tautological Implications) We say that a first- 
order formula A is a tautology, and write au, A, iff the abstraction of A is a 
tautology. In first-order logic, we write T E.au, A iff the abstractions of the formulae 
in I tautologically imply the abstraction of A. 0D 


Before we leave the section on language, we need a few more definitions, notably 
concepts of substitution. 

As in algebra, we want to be able to substitute an object for a variable that accepts 
such substitutions (i.e., it is a free variable). But for the purpose of applying Leibniz 
we also want to be able to substitute a formula A into a Boolean variable p (cf. 1.3.15). 

For any terms s and ¢ and variable x, the notation (in the metatheory) “s[x := ¢]” 
will denote the result of replacing all original occurrences of x in s by ¢. Similarly, 
for any formula A, variable x and term ¢, the (informal) name A[x := ¢] will mean 
the result of replacing all original free occurrences of x in A by f. 

In the case of a formula A, we must be careful when to allow such a substitution 
to take place. For example, (4r)72 = y says, intuitively, that no matter what the 
(value of) y, there is a (value of) x that is different. 

We expect that the meaning that we just expressed in English should be independent 
of what name we use for the free variable y. 

Yet, if we allowed substitutions of terms into y recklessly we would in particular 
allow the substitution 


((ae)-2 = y) ly i= 2] 
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which results into (Ar)-x = x. But this has a totally different meaning from the 
original: It says that there is a (value) x that is not equal to itself, a clearly absurd 
suggestion! 
This motivates us to disallow the completion of the operation of substitution if it 
results in a free variable x getting in the scope of (Vx)—getting captured, as we say. 
This is taken care of in the following (recursive!) definition, which has two parts: 
one for s[x := t] and one for A[x := ¢]. 


In order to read the following definitions on substitution correctly—‘operations” 
such as [x := ¢],[p := B] and [p \ B]—we emphasize that the operation takes 
place in the metatheory and has the highest priority against all other “formal or 
informal operations” such as V, 3, =, 7, A, V, +, =. For example, (4x) A[p := B] 
means (4x){ A[p := B]} and t = s[x := s’] means t = {s[x := s’]}, where the 
symbols at \ are here meta-brackets inserted to indicate order of application of 
“operations”. By abuse of notation—‘(t = s)” being illegal—we write (t = s)[x := 
s’] for {t = s}[x := s’]. Once more, these operations are /eft-associative, e.g., 
A[p := B][q := C] means { Alp := B]}[q := C]. 


4.1.28 Definition. (Substitution of Terms into Variables) /n what follows, we al- 
low “=” to appear only formally, thus unlike Definition 1.3.15, instead of also using 
“z=” metatheoretically, we will use the verb is for equality between strings. 


In the interest of generality, the definitions are given in terms of the metavariables 
x and y that name arbitrary variables. Since x and y name arbitrary variables, it is 
conceivable that they name the same variable. If we want to claim the opposite, we 
must explicitly say so: “x and y are different” or “x (not y)”. 


(1) The meaning of “s[x := t]’—i.e., its expansion—is given by induction (re- 
cursion) on the complexity of the term s: 


8 if s is a constant or a variable (not x) 
s[x := t]is ¢¢ if sis x 
f(si[x := ¢],...,6n[x:= ¢}) ifs is f(s1,...,8n) 


where we used the notation with brackets and commas (cf. 4.1.6) in the general case 
above. 

(2) The definition says that we are to replace every free occurrence of x in A by ¢, 
but if for some y that occurs in t—free, of course—there is a subformula (Vy) B of 
A where x occurs free, then we abort the operation and declare A[x := ¢] undefined. 
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The meaning (expansion) of “A[x := t]” is given by induction (recursion) on the 
complexity of the formula A. We use reduced-brackets notation, as usual: 


$(si[x := t],...,$n[x:=¢#]) if Ais O(51,...,5n) 


81[x := t] = sq[x := ¢] if Ais s; = sq 

aC|x := ¢] if Ais aC 

C[x := t]o D[x:=¢| if AisC oD 

A if A is one of p, T, 1, (Vx)B 
Alx:= tlis ¢ (Vy) Bix := if Ais (Vy)B, 


where y(not x) does not occur in ¢ 
or x is not free in B 

undefined if A is (Vy)B, 
where y(not x) does occur in t 
and x is free in B 


where o above is one of A, V, >, =. 

In each case above, the left-hand side, A|x := t], is defined iff all the needed 
right-hand side substitutions are defined—e.g., C[x := t] and D[x := t] in the case 
of o. O 


The definition is pretty natural: 

In (1) the middle case (where s is x) is obvious. The case where s a constant 
is too: You cannot change the constant! As for when s is a y—that is, other than 
x—then there is no change in s either, for we are asking to change x but s neither 
contains, nor is, x. 

The last (inductive) case is also pretty understandable: How would one go about 
plugging a3 into z in, say, cos(sin(x))—i.e., what are the steps to do cos(sin(x))[x := 
3]? Well, we work from inside out: We first plug the 3 into the z of sin(z)— to obtain 
sin(3)— and then we apply cos to sin(3) to get cos(sin(3)). 

The last case of (1) simply generalizes this example: Think of cos as “f”, take 
n = 1, and then think of sin(z) as “s,” and 3 as “t”. 

In (2) the points to emphasize are: 

(a) If A is (Vx)B then, intuitively, A does not depend on x—-x is not free. So we 
can plug t into x by doing nothing: We do not change A. 

(b) In the case before the last we are told that to form A[x := ¢] we just form 
B[x := ¢] first, and then we add the quantifier (Vy) up in front—as long as no 
variable in t gets captured (p. 132) by this (Vy). Of course, the only variable that 
can ever belong to (Vy)—<f. 4.1.22—is y itself, so the condition “y (not x) does not 
occur in t” is sufficient (but not necessary) to avoid capture. The condition “x is not 
free in B” is also sufficient by itself as then ((vy)B) [x := t] is just (Vy) B—same 
as before the substitution. See Exercise 4.1.33 below. 

(c) In the last case, there will be capture—if we go ahead with the substitution— 
since at least one occurrence of x is free in B. Thus, at least one x of B will be 
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replaced by ¢ causing a y in ¢ to bind with the (Vy). We have agreed that in this case 
we must abort (cf. the discussion following 4.1.27). 

Thus we say that the substitution as requested cannot be performed. For short, 
the result of the operation “A[x := t]” in this case is unavailable; it is undefined. 

We conclude our syntactic preliminaries by expanding Definition 1.3.15 to first- 
order languages. As we said before, we need something like “replace all occurrences 
of p in a formula A by the formula B” in order to make the Leibniz rule formulation 
friendly. 

It turns out that we will need two concepts of formula substitution, one of which 
will be denoted analogously with A[x := t] as A[p := BJ, and, also analogously, 
it will not be allowed to proceed in case of variable capture; it will be undefined 
(“impossible”) in that case. We may call this substitution conditional. 

We will also benefit from the presence of an ‘‘unconditional” substitution, one that 
we allow to proceed regardless of any capture. This will be denoted by A[p \ B). 

At this point, having two types of substitution into Boolean variables may look 
like an extra burden, but I promise that the usefulness of both will be evident before 
too long. 


4.1.29 Definition. (Unconditional Substitution) The unconditional substitution of 
a formula into all occurrences of a Boolean variable p in a formula A is denoted by 
Alp \ B) and is defined almost exactly as in 1.3.15, by adjusting the atomic case and 
by adding a clause for “(Vx)”. 


B if Ais p 

A if A is in AF but is not p 
Alp \ B]is 4 =C[p \ B] if Ais sC 

C[p\ Blo D[p\ B] if AisCoD 

(vx)C|[p \ B] if A is (Vx)C 


O 


The above is straightforward. The inductive cases go through without restrictions, 
and what they do is to apply the substitution to the immediate subformulae of A, and 
then apply the appropriate connective (Boolean, or the quantifier, as the case may 
be). The immediate subformula of (Vx)C is C, i.e., the formula on which we applied 
“(Wx)” to get (Vx)C' during the relevant formula-calculation. 

Case one is clear cut, but so is the second: The only way an atomic formula A 
can contain p is to be p. So if it is not, plugging B to p is irrelevant; it does not 
change A. 

The conditional substitution is defined similarly to the unconditional one, but the 
former will disallow the last case in the definition above if capture occurs. This will 
render the result of the operation A[p := B] undefined. 


4.1.30 Definition. (Conditional Substitution) Conditional substitution of a formula 
into all occurrences of a Boolean variable p in a formula A is denoted by A[p := B]. 
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The result of this substitution will be undefined if capture of a variable occurs at any 
step. Below we will use reduced-bracket notation: 


B if Aisp 
A if A is in AF but is not p 
Alp 'BVis aC|p := B] ie a aC 
Clp:= B)oD[p:= B] ifAisCoD 
(Vx)C[p := B] if A is (Vx)C and x is not free in B 
else undefined 


In each case above, the left-hand side, Alp := B], will be defined iff so are all the 
contributing substitutions in the right-hand side. O 


Intuitively, it is immediately evident that whenever A[p := B] is defined, then it 
is expanded (stands for) as the same string that A[p \ B] stands for. We can actually 
prove this by induction on the complexity of A. 

For atomic A this is obvious, indeed the “whenever” is superfluous: Both A[p := 
B] and A[p \ B] are defined in this case, period. 


We have three more cases, precisely the ones in 4.1.30 and 4.1.29: 

(1) Suppose A is =C. 

Let A[p := B] be defined. By 4.1.30 this is —C[p := B], and C[p := B] is 
defined. By the LH. 


Cp := B] is the same as C[p \ B] (x) 


By 4.1.29, A[p \ B] is -C[p \ B]. By (*), A[p \ B] and A[p := B] are the same. 

(2) Suppose A is Co D (0 is any of A,V,—,=). This case is similar to the 
previous, so I leave it to the reader. 

(3) Suppose A is (Vx)C. Let A[p := B] be defined. 

By 4.1.30, this is (Vx)C[p := B], and C[p := B] is defined, and x is not free in 
B. 

Now, the I.H. applies to the less complex C' (immediate subformula of A), hence 
by the middle conjunct of the preceding statement, again (x) above holds. By 
Definition 4.1.29—which is totally indifferent to “and x is not free in B’—A[p \ B] 
is (Vx)C[p \ B]. By (*), A[p \ B] and A[p := B] are the same once more. We are 
done. 


4.1.31 Example. We calculate a few substitutions according to Definitions 4.1.28, 
4.1.29, and 4.1.30. 


(1) (x = y)ly = a]. 

This (by 4.1.28) is z[y := 2] = y[y := 2], which again by 4.1.28 is x = z. 

(2) ((ve)e = v) [y := 2]. Using 4.1.28, this is undefined because it falls under 
the last case in the definition (part (2)). 
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(3) (Vx)(x = y)|[y := x]. According to our redundant-brackets convention, this 
is (Wz){ (x = y)ly = al}. 
This does not ask that we form the substitution in (2) above, it rather asks (in a 


roundabout way, using (1) above), “Apply (Vz) to x = x”, which is fine.* The final 
answer is (Vx)x = a. 


(4) ((v2)(vy)o(@, 9) ly s= al. 

According to 4.1.28, we are asked to do ((Vy)¢(x,y))[y := a] and then apply 
(Vx). We can do this if y is not free in (Vy) (2, y). 

Luckily, it is not. So the final result is (Vr)(Vy)4(z, y) since ((Vy)¢(x,y))}[y = 
z] is just (Vy)@(z, y) by the third case (from the bottom) in the definition of A[x := ¢]. 


(5) (z =aV(Vx)r = u) [y := x]. This requires us to calculate the following two 
substitutions first: 

(a) (z = a)[y := z] 

(b) ((v2)z = y)Iy := 2] 
and then connect them with a “V” if both are defined. The substitution requested 
in (b) is undefined by (2) above; hence the whole thing is undefined even though 
part (a) is defined. For the record, part (a) is calculated as z[y := z] = aly := 2], 
that is, z = a (cf. (1) in Definition 4.1.28). 


(6) ((vz)p) Ip\ x= yl] is (Vx)r = y. 
(7) ((vz)p) [p := x = y] is undefined (z in x = y gets captured). Oo 


4.1.32 Exercise. Prove by induction on the complexity of A that A[p \ B] is® a 
formula, and that so are A[x := ¢] and A[p := B] whenever defined. im 


4.1.33 Exercise. Prove by induction on the complexity of A that if x is not free in 
A, then, for any term t, A[x := f] is A. Oo 


4.1.34 Exercise. Intuitively, A[x := x] is A since we change nothing in A when we 
replace x by x. But just for the sake of doing yet another proof by induction, prove 
by induction on the complexity of A that indeed A[x := x] expands as A. O 


4.1.35 Example. This is an important example, as we will quote it in the crucial 
“dummy renaming metatheorem” later on (6.4.4). 

Assume that z does not occur in A (as either a free or a bound variable). As we 
say, z is fresh. That A[x := z][z := x] should be just A is pretty much “obvious”, 
right? 

Pause. The term obvious is very useful in mathematics. It leads to the technique 
of proof by intimidation. You just say, “It is obvious”, and no one who does not want 
his intelligence questioned will dare ask you “why”! 


*This is because (x = y)[y := x] is x = x by part (1) above. 
% fs here is argot for expands as. 
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You change x into z and then z back into x. This should get you back to where 
you started, namely, A. Shouldn’t it? 

Well, let A be (Vz)x = z. Then A[x := z] is undefined (i.e., not a formula) and 
therefore so is A[x := z]|z := x]. Or, let A be x = z. Then A[x := gz] isz =z 
and A[x := z]|z := x] is x = x—which is not A, unless x and z denote the same 
variable. Thus the stated condition is necessary. 


But is it sufficient? Yes, as we see by induction on the complexity of A—under 
the stated condition—i.e., 


A(x := z][z := x] expands as A (1) 


Implicit in (1) is that both substitutions A|x := 2] and { A[x := 2]}[z := x] 
are defined, where { } are metabrackets indicating here a completed operation. 


We look only at the case of distinct variables x and z, since otherwise we are looking 
at A[x := x][x := x], which is A, because A[x := x] is also A (4.1.34). 

To handle the atomic case we need to handle the case of terms, since, e.g., t = s is 
one of the atomic cases. So we prove first, by induction on the complexity of terms 
t, that 

t|x := z][z := x] is ¢ as long as z does not occur in t (2) 


There are three cases (Definition 4.1.28, part (1)): 


Case 1: ¢ is x. Then ¢[x := z] is z and z[z := x] is x; thus t[x := z]|z := x] is x, 
that is, ¢. 


Case 2: ¢ is y, other than x—and also other than z by hypothesis—or t is a constant 
a. Then t[x := z] is t (y or a), and since y is not z, t[z := x|—.e., 
t[x := z]|z := x]—is still t in either case (y or a). 


Case 3: tis f(31,...,8,). Nowé[x := 2][z := x] is f(s1[x := 2][z := x],...,8,[x 
:= z][z := x]), which by the ILH.—applicable since all the s; are less com- 
plex than t—is f(s1,...,8,), that is, f. 


Now that (2) is proved, weturn to (1). For A atomic—following part (2) of 4.1.28 —we 
find that the claim is trivial in the cases where A is other than t = s or $(s1,...,8n). 
Even these cases are immediate with (2) settled. For example, (t = s)[x := z][z := 
x] is t[x := g][z := x] = s[x := z][z := x], which by (2) above is t = s. 

The claim also clearly “propagates” with the Boolean formation rules. That is, 
(cf. 4.1.10, parts (2) and (3)) if the immediate subformulae of A—i.e., B if A is —B; 
Band C if Ais BoC, where o is one of A, V, +, =— satisfy the claim, then so does 
A itself. 

Consider then the two subcases where A is either (Vx)B or (Vw)B where w and 
x are different. Note that under our assumptions, the subcase (Vz) B does not apply. 
In the first subcase A[x := z] is A (cf. 4.1.28, 3rd case from the bottom). Since, by 
assumption, z is not free in A, we get that A[x := z][z := x]—that is, Az := x|—is 
just A (Exercise 4.1.33). 
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The second subcase now is displayed in the calculation below: 


A[x := 2|[z := x| is ((vw)B) [x := a|[z:= x| 
is ((vw) Bix = 2}){z := x] by 4.1.28 + LH. 
is (Vw) B[x := 2][z:=x] by 4.1.28 4+ LH! 


is (Vw)B by I.H. 
is A Oo © 


One last useful syntactic concept that we need to formulate our axioms is that of 
partial generalization. 


4.1.36 Definition. We say that B is a partial generalization of A if B is formed by 
prefixing A with zero or more expressions such as (x) for any choice of the bound 
variable x. Oo 


4.1.37 Example. Here is a list of some partial generalizations of x = z: 


zx=2z (Vw)r =z (Vx) (Vr)a = z (Vx)(Vz)a = z 
(Wz)(Va)x =z (Wz)(Vz)(Vz)(Vx)(Vz)x = z | 


The preceding definition may be rephrased recursively: 


4.1.38 Alternative Definition. A is a partial generalization of A. If B is a partial 
generalization of A, then the same is true of ((Vx) 8) for any choice of the bound 
variable x. O 


4.2 AXIOMS AND RULES OF FIRST-ORDER LOGIC 


At the beginning of Section 1.4, I said “Boolean logic is a (crude) vehicle through 
which we formulate and explore mathematical truth.” To this end, the axioms of 
Boolean logic determine how the Boolean connectives and the constants | and T 
behave. They postulate the a priori truths from which we proceed to discover more 
and more truths in mathematics. 

Note how I used the qualifier crude and elaborated further on (p. 113) that “I regret 
to say that [Boolean logic] is totally insufficient [toward discovering mathematical 
truth]” because it manipulates mathematical statements “by name” only, and is thus 
incapable of seeing inside the first-order structure of the statements; therefore it does 
not allow us to talk about objects, quantifiers, or equality. 


‘00The LH. relevance stems from the italicized remark following (1) on p. 137. 
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Enter first-order logic. Its axioms must still determine the behavior of Boolean 
connectives and constants, but they must also tell us how the quantifier “Vv” and 
equality “=” behave. Naturally, then, the axiom set for first-order logic includes all 
the axioms of Boolean logic. 

Just as in the case of Boolean logic, one chooses the axioms of predicate calcu- 
lus very carefully so that they express “universally acceptable”’—i.e., application- 
independent—principles. As such, an axiom like “(Vz)(—x2 + 1 = 0)”—while 
valuable for the study of the application known as number theory, which is about 
the properties of natural numbers—has no place as an acceptable universal principle, 
and will not be included.!°! 


4.2.1 Definition. (Logical Axiom Schemata of Predicate Logic) In what follows, 
A, B,C stand for arbitrary formulae, and x for an arbitrary variable. 

The set of logical axioms of first-order logic consists of all possible partial 
generalizations of the formulae in the following groups, Ax1—Ax6: 


Axl. This group contains all tautologies (cf. 4.1.27). 


For example, p V -p, x = 0 V ~z = 0 and r — r V.q are each included. So is 
-@=5=24=5=1. 


@ Although we present predicate calculus ina format suitable for all applications, 
we have already said that in examples we reserve the right to use symbols from 
specific applications, such as the “0” —or the “5S”, short for SSSSSO0—of the 
language where number theory is spoken. 


Ax2 


This group contains all formulae of the form (Wx)A — A[x := ¢]. 


It has the name specialization axiom but also substitution axiom. 


Ax3. This group contains all formulae of the form (Vx)(A — B) — (Vx)A — 


(Vx)B (Ihave used least parenthesized notation; cf. 4.1.19). 


Ax4. This group contains all formulae of the form A — (Vx)A, where x is not free 
in A. 
Thus x = 0 — (Vy)x = 0 is included but z = 5 — (Vz)z = 5is not. 


Ax5. 


This group contains all formulae of the form x = x. The name of this axiom 
group is “identity axiom (group)”. Again we have infinitely many axioms in 
the group, even before the application of partial generalization, because there 
are infinitely many instances of the metavariable x. 


Ax6. This group contains all formulae of the form = s > (A[x := ¢] = A[x := 
s]). It is called the “Leibniz axiom (group) for equality”. 


10! For example, (Vz)(—z + 1 = Q) is a false statement about the real numbers, for it would say that 
“for every real number = it is true that if you add | to it, the result will be different from zero”. Yet 
1 + (—1) = 0. This axiom is actually included as a special or nonlogical axiom, if one wants to just do 
number theory. It is patt of the so-called Peano axioms for number theory or arithmetic. 
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We will denote by A, the set of al! logical first-order axioms defined here in order to 
distinguish it from the A of 1.4.4. Think of the subscript “1” as a reminder that we 
are talking about /st-order axioms here! Oo 


Of course, A is a proper (i-e., not equal) subset of group Axl. Even though it 
may look like overkill to include all tautologies in group Ax1, doing so has both 
a technical and practical basis. This approach is encountered in the literature, e.g., 
[13, 33, 50, 51, 53]. The foundation here is that of [50, 51]. 

In a correctly founded first-order logic we can certainly prove Post’s theorem. 
Thus, technically, whether or not we include all tautologies as axioms, we will have 
them as theorems, anyway. Practically, Post’s theorem already, in Part I, has given 
us the license to invoke without proof, within a formal proof, whichever tautologies 
are known to us exactly in the same manner that we invoke axioms. Moreover, the 
technical demands of writing predicate calculus proofs justify our putting to rest the 
until-now meticulous syntactic proofs of tautologies and tautological implications 
that we practiced in Part I, and to concentrate instead on mastering the difficulties 
that are peculiar to the handling of quantifiers and other non-Boolean elements. 

Rest assured that whichever such tautologies (or tautological implications) we 
invoke in practice are of a very trivial nature and are readily recognizable as such. 


4.2.2 Remark. We argue here, intuitively, of course, that the axioms listed above 
are indeed universal principles, unbiased by considerations specific to particular 
theories.'©? 

(1) That the axioms in group Ax1 express universally true principles stems from 
our intuitive understanding of the term tautology. 

(2) The name specialization of Ax2 stems from what it says: “If a statement (here 
A) is true for all (values) of x, then it must be true of any special value t that we are 
allowed to plug into x.” The principle in quotes is valid no matter what application 
of logic we may have in mind. 

Which part says “that we are allowed’? The “[x := ¢]” part, of course, as you 
will recall from 4.1.28. 

Definition 4.1.28 keeps Ax2 honest, and true, since substitution is not always 
allowed. 

For example, take A to be (Sy)—z = y. The formula in (+) below is not included 
in group Ax2, since we are not allowed to perform the substitution “((4y)-a = 
y)[z := y]” to get “(Ay)-y = y”. 

This is just as well, because (*) states an invalid principle! It states, “If for every 
value of x there is a value of y that is different, then there is some value of y that is 
different from itself.” 


(Vz) (Sy)ra = y > (Ay)-y = y (*) 


'©2 Theory was defined on p. 116. 
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Why is (*) invalid? Because the if part does have true instances—e.g., no matter 
which real number you pick, I can find one that is different—but the then part is never 
true! 

(3) Let us see now why, intuitively, every instance of the schema 


(Vx)(A — B) — (Vx)A — (Vx)B (f) 


of group Ax3 expresses a universally valid principle. Well, let us freeze for our 
discussion the formulae A and B and the variable x. Statement (4) says: 


If (Vx)(A — B) holds (i) 

and 
if (Vx) A holds (ii) 

then 
(Vx) B holds (iit) 


Let us then suppose the informal statements (7) and (ii) and conclude (iii). 

Statement (i) says, in words, “Any value of x that makes A true, also makes B 
true.” Statement (iz) says, “A holds for all values of x.” Hence, by (i), B holds for 
all values of x; we have got (iii)! 

(4) The universal validity of the principle expressed by the schema in group Ax4 
is easy to verify: You see, if x is not free in A, this means, intuitively, that what 
A says is independent of x. Therefore proclaiming “For all x, A holds” or just “A 
holds” makes no difference. 

Have I not just argued that A = (Vx) A is a correct universal principle? Yes, I did! 

Note, however, that mathematicians (and logicians) prefer to say as little as nec- 
essary in their axioms. This is why the group Ax4 is formulated as above, with “—” 
rather than with “=”. The < direction! need not be stated since it can be proved 
formally to follow from Ax2. For now, in outline, let me only indicate that such a 
“proof” would be based on the following observations: (Vx)A — A[x := x] is in 
Ax2 and we know that A[x := x] is just A (cf. 4.1.34). 

(5) The schema x = x expresses an obviously valid universal principle: An object 
is the same (equal) as itself, no matter what this object may be. 

(6) Schema Axé6 also expresses a universally valid principle. It says, “If two 
objects ¢ and s are the same, then for any property A, either both ¢ and s have the 
property, or neither does.” At the informal level, the intuitive concept of property is 
synonymous with that of statement. Indeed, a statement P determines a property, 
shared by all objects that make P true. Conversely, a property P determines the 
statement “x has P”. 

The origins of Ax6 are traced back to Leibniz’s characterization of equality be- 
tween objects. He suggested that two objects are equal iff they have exactly the same 
properties. In symbols, Leibniz’s statement is captured by 


t=s =(VP)(P|x :=t] = P[x:=s]) (+*) 


'031n a Ping-Pong argument; cf. 3.4.5. 


ee 
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Of course, the language of first-order logic does not allow us to write (Vé)—let alone 
(VP). So how can we translate («+) into our first-order language? Our translation is 
forgetful. First, we forget the — direction of (+*) and are content with stating just 


t=s— (VP)(P|x := t] = P[x:=s]) (+ # x) 


Second, not being allowed to use “(VP)” to express “for all P’”, we do the next best 
thing: We exhaustively postulate all the infinitely many formulae of the form 


=s8— (P[x:= t] = Pix :=s]) (4) 


For each instance of the syntactic variables t, s,x, and P we obtain a formula of the 
form (4), in essence replacing the intuitive concept of property or statement by the 
formal one of formula. But (4) is precisely the schema Ax6! 

Unfortunately, this latter compromise, going from (+ + *) to Ax6, is also forgetful. 
Itis a fact that we will not establish here (it uses a bit of a set theory argument) that there 
are far more properties than there are first-order formulae; thus our “representation”, 
or “coding”, of the former by the latter is far too coarse. Thus, saying, on one hand, 
for all properties P, which is the meaning of (« * ), and, on the other hand, for all 
formulae P, which is the meaning of schema (9), i.e., Ax6, are two very different 
things! 

Oh well, not to worry about this. As it turns out, and this is rather surprising after 
all this hacking at (+*), we have retained enough in Ax6 to still be able to prove the 
usual properties of equality, such as symmetry (x = y — y = x) and transitivity 
(x=yAy=z2—>x=2). QO 


(1) Just as in our discussion of Ax2 we observe that the definition of substitution 
(4.1.28), which has an “undefined” or “don’t do it!” case, keeps Ax6 honest: By 
disallowing substitution in certain cases we disallow incorrect instances of the axiom 
schema. For, imagine that A is (Ay)72 = y. 

We can try Ax6 on this formula, taking ¢ and x to be the variable z and s to be the 
variable y: 


t=y—((Gie=v)ie=a=(@)~=2)e=)  @) 


However, “((Sy)>a = y) (x := x}” is just (3y)-x = y (cf. 4.1.34), but “((4y)-a = 
y) [z := y]” is undefined by 4.1.28. This means that (2) does not translate into 


z=y- ((ay)-2 = y = (ay)-y = y) (3) 


in other words, we cannot proclaim (3) as a member of axiom group Ax6. This is 
just as well, because (3) is unacceptable! 

Let us see why, arguing intuitively (intelligently, but loosely). Now “(Sy)7y = y” 
says, “There is a value of y that is not equal to itself.” This is a falsehood. 

Thus, (3) simplifies into 


z=y-7(dy)-r=y (4) 


? 
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a simplification that uses principle (4) in the list of universal principles given in 1.4.4, 
and thus also included in Ax1. 


Pause. You surely have noticed that when I argue nontechnically—when I am not 
writing a proof, that is—I avoid saying “axioms” and rather say “principles”. 


Recall that J is just an abbreviation (4.1.17), so (4) really says!“ 
tay (Vy)e=y (5) 


Do you believe (5)? If you do, then you must also accept any special case of it, such 
as 0 = 0 — (Vy)0 = y. Unfortunately, to the left of “—” I have a “true” statement 
but to the right I have a “false” one. So the special case topples, bringing down with 
it the general case (5). 


(2) Ax6 must not be confused with the Leibniz Rule (nfl of 1.4.2) of Part I (which 
we will reintroduce shortly for predicate calculus). 
Ax6 is about object equality, “=”. In Part I we had no objects to talk about. 


Theorem (SFL) of Section 3.4, which [17] nicknamed the Leibniz axiom, has 
no connection with axiom schema Axé6 beyond a superficial structural similarity—if, 
that is, one forgets the fundamental difference between “=” and “=”. Moreover, 
while (SFL) is provable, as we saw, within propositional calculus, schema Ax6 is 
known not to follow from the other axioms of predicate calculus. 


We next turn to the primary rules of inference for predicate logic. We have 
augmented the logical axiom set of Boolean logic (1.4.4) by adding axiom groups 
Ax2-Ax6—and that mysterious “partial generalization” whose purpose will be clear 
soon. 

We have added just enough axioms to the Boolean case so that all the properties of 
objects, quantifiers, and equality can be reasoned about without introducing any new 
rules of inference! We will use in first-order logic the very same rules we introduced 
in 1.4.2, namely Inf1 and Inf2. 

Now this must be interpreted to mean that we apply rules Inf1 and Inf2 to the 
Boolean abstractions (4.1.25) of first-order formulae—after all, these are Boolean or 
propositional rules, that is, they apply to Boolean formulae. Therefore we must view 
first-order formulae as Boolean ones in order to make sense of applying these rules 
to them. This is not hard to do: 

The form of Inf2 remains the same in the first-order case, namely, 


A,A=B 


B (Eqn) 


since the abstraction of a first-order formula A = B will still look like A = B—no 
change in shape—because abstractions do not eliminate Boolean connectives, unless 
these are in the scope of a quantifier. The “=” of A = B is not in any such scope 


104 A tiny leap of faith, this. In slow motion, “=(4y)2 = y” is short for “2-(Vy)->2 = y’, which, 
losing the double negations, yields “(Wy)z = y”. Intuitively, anyway! 


ee 
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(Why?). In other words: We do not need to explicitly convert first-order formulae to 
their abstractions in order to apply Inf2 (equanimity). We apply it directly. 
We must be more careful with Inf1: 


A=B 
C[p := A] = C[p := B] 
Since this rule is applied to Boolean abstractions of first-order formulae, any “p” that 
occurs in C’ must be “visible” in the abstraction; that is, it must not occur within the 
scope of a quantifier in C’. Thus the rule, without any reference to abstractions, is 
stated as 
A=B 
C[p := A] = C[p := B]’ (BL) 
provided that p is not in the scope of a quantifier in C 


We will call this rule BL for Boolean Leibniz. 


4.2.3 Remark. (1) The qualifier “Boolean” in BL is important as, on one hand, it 
reminds us of its origin and in particular the restriction on p, and, on the other hand, 
it distinguishes it from derived Leibniz rules that we will soon prove and use. These 
derived rules do allow us to replace a Boolean variable p that does occur in the scope 
of a quantifier in C’, so they act beyond the Boolean structure of formulae. 

(2) Why is it a good thing not to add new (primary) rules of inference? Because 
since the Boolean axioms are a part (subset) of the first-order axioms, keeping the 
rule set invariant means that anything we can prove in Boolean logic we can still 
prove in first-order logic as long as the concept of proof remains the same (it does). 
We will keep all our Boolean theorems! i) 


In summary, 


4.2.4 Definition. (First-Order Rules of Inference) The primary first-order logic’s 
rules of inference are Eqn and BL above. O 


4.2.5 Remark. We can rewrite BL as 
A=B 
Clp \ A] = Clip \ B]’ (BL) 
provided that p is not in the scope of a quantifier in C’ 


the reason being that the restriction stated makes capture of a free variable by a 
quantifier impossible during the two substitutions in the conclusion part of the rule. 
We have seen (p. 135) that—under no-capture conditions—the formulae Cp := A] 
and C[p \ A] are identical strings. 

Nevertheless, we will retain the original notation for BL for a reason we will 
explain later. O 
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We are ready to calculate (theorems) once again! As promised earlier on in our 
discussion here, the concepts of theorem-calculation (proof) and theorem remain 
exactly the same as in Part I. We repeat the definitions here, for the record, word for 
word: 


4.2.6 Definition. (Theorem-Calculations—or Proofs) Let I be an arbitrary, given, 
set of formulae. ! 

A theorem-calculation (or proof) from T is any finite (ordered) sequence of for- 
mulae that we may write respecting the following two requirements: 


In any stage we may write down 


Pri Any member of A; orl’. Any member of I that is not also in Ay we will call a 
nonlogical axiom. 


Pr2 Any formula that appears in the denominator of an instance of a rule Inf1-Inf2 
as long as all the formulae in the numerator of the same instance of the (same) 
rule have already been written down at an earlier stage. 


We may call a proof from T by the alternative name I- proof. oO 


4.2.7 Definition. (Theorems) Any formula A that appears in a I-proof is called a 
[’-theorem. We write + A to indicate this. If is empty (1 = @)—i-e., we have no 
special assumptions—then we simply write | A and call A just “‘a theorem”. 
Caution! We may also do this out of laziness and call a I'-theorem just “a theorem”, 
if the context makes clear which T 4 @ we have in mind. 

We say that A is an absolute, or logical theorem whenever I is empty. O 


Note that, once again (cf. practice in Part 1), in the configuration “+ A” we take 
A, for granted and do not mention it to the left of . 


4.2.8 Exercise. It is a good place here to redo Exercise 1.4.11, so that you can check 
your understanding of the concepts proof and theorem. Once again, for exactly the 
same reason as in Part I (What was the reason?) we have (verify!): 

(1) AF A, for any A. 

(VTE A,if A ET. 

(3) B, for any axiom B. O 


4.2.9 Remark. This remark retells 1.4.8. A [-proof of a formula A is, by definition, 
a sequence of formulae 


Bien Be ACh cc On 


with proprieties as stated in 4.2.6. It is trivial that if we discard the part “Cy, ...,Cm”’ 
of the sequence, then 
By,...,Bn,A (1) 


1051 Part II, formulae means first-order formulae unless otherwise indicated. 
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is still a proof. The reason is that every formula in a proof is either legitimized 
outright—without reference to any other formulae—or is legitimized by reference to 
formulae to its left. 

Thus (1) too proves A, since A occurs in it. This technicality allows us to stop a 
proof once we have written down the formula that we want to prove. O 


So, 4.2.7 tells us what kind of theorems we have: 
(1) Anything in Ay UT! 


(2) For any formula C and variable p—not in the scope of any quantifier of C— 
C|p := A] = C|p := B), provided A = B was written down already, therefore 
is a (['-) theorem. 


(3) B (any B), provided A = B and A were written down already and therefore are 
both (T’-) theorems. 


The above is a recursive definition of ([-) theorems, and is worth recording 
(compare with Definition 4.1.14). 


4.2.10 Definition. (Theorems, Inductively) A formula F is a T’-theorem iff it fulfills 
one of the following: 


Thi Fisin A, UT. 


Th2 For some formula C and variable p—not in the scope of any quantifier of C—E 
is C[p := A] = C[p := B], and (we know that) A = B isa (T-) theorem. 


Th3 (We know that) A = E and A are (T-) theorems. Oo 


The concept of proof defined in 4.2.6 is that of Hilbert-style proof, which we will 
arrange vertically——just as in Part |—with annotations. All that we said in Section 1.4 
carries over here unchanged. In particular note Example 1.4.13. 

Note that wherever in the proofs that we wrote in Part 1 we said “by Leibniz”, or 
just “Leib”, we will here replace it with “by BL”, or just “BL”, and the proof will 
remain valid. \n particular, part (d) in 1.4.13 establishes / A = A. Even though 
that proof carries over unchanged here, we have in the new setting a more immediate 
proof: For any A, A = A is a member of Ax]. 


Moving to the results of Chapter 1, again, everything we did there carries over 
unchanged. In particular 2.1.1, 2.1.4 and Corollary 2.1.6—which allows us to use 
already-proved theorems in proofs—hold, as well as “the other Eqn” (2.1.11). 

Just for the record, “Eqn + Leib Merged” (2.1.16) is still valid and still equivalent 
(as powerful as) Eqn and BL combined. This rule, transcribed in first-order logic, 
will now have the condition “where p does not occur in the scope of any quantifier 
of C”,!07 


106We explained the notation “A UI” in 1.3.14 on p. 36, item (3). 
'07ft will turn out that the restriction “where p does not occur in the scope of any quantifier of C” can be 
removed from BL to obtain a valid—for first-order logic—derived Leibniz rule. But 1 am ahead of myself. 
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Redundant true (2.1.21) and the extremely important metatheorem 2.1.23 remain, 
of course, valid—the former for a now-trivial reason (the relevant schema is a member 
of Ax1). 

We do not need to add anything new to what we already said about the equa- 
tional proof methodology (Section 2.2) nor need we reiterate that all the results of 
Sections 2.4 and 2.5 remain valid. 


Note that the proof of modus ponens, which we gave in Boolean logic (2.5.3), is 
also valid in first-order logic. 


The deduction theorem (2.6.1) holds in first-order logic exactly as stated originally 
(2.6.1). To see this, note that the proof of the “main lemma” (2.6.2) goes through 
pretty much unchanged (remember to say “BL” where we said “Leib” before). 

We need only to rephrase the basis case as follows: 


Basis. D has complexity 0: So it is one of: 


(1) p: Then we must show A — (B = C)+ A — (B = C) and we are done 
by 1.4.11. 


(2) q (other than p), or T or 1, or O(t1,...,tp) ort = 8: By reference to 4.1.30 
we see that we need to establish that A + (B = C)+ A—(D=D). Well, 
start with +} D = D (Ax1). By (3) in 2.5.1, and 2.1.4,+ A — (D = D). Weare 
done by 2.1.1. 


It is worth noting that a really simple alternative proof of the deduction theorem 
can be given for first-order logic using the tools of the next chapter. 


It is extremely important to emphasize, especially for the reader who encountered 
alternative, very complex versions in the literature that: The deduction theorem in 
our version of first-order logic is exactly the same as in the Boolean (propositional) 
case. 


Are those very complex versions wrong. then? No. 


The small print here is that it is possible to found first-order logic somewhat differ- 
ently: with fewer axioms but with the addition of a primary rule called unconstrained 
or strong generalization: 

A 


(Vx) A m 


In a so-founded first-order logic, the deduction theorem is ugly—that is, it has 
cumbersome restrictions (cf. (35, 45, 52, 53]). The approach in the present volume 
is analogous to that in (2, 13, 50, 51], where the primary rules are, essentially, 
propositional. The trick is, rather than adopting rule (1), to encode it in the axioms 
(Ax3, Ax4) and to have these axioms and the only two primary rules, Eqn and BL, 
simulate a somewhat weaker version of (1). 

The side benefit: We can have a user-friendly deduction theorem that reads and 
applies exactly as in the Boolean case! Needless to say, the related results, 2.6.6 
and 2.6.7, hold in first-order logic. 
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The most crucial result that carries over from Boolean to predicate calculus is 
Post's Theorem: [fT ra A, thenT + A. 
In our particular foundation of first-order logic, the special case of T = 9 is direct 
from Ax: 
If Frau A, then A belongs to Ax1, hence A 


Even the case of nonempty, but finite, follows easily from Ax1 and MP, as we see 
in the next chapter—hence so does 3.3.1. 

The (meta)proof of the infinite (“general”) case given in Section 3.2 caries over 
unchanged as long as one remembers to work throughout with Boolean abstractions 
(4.1.25) of first-order formulae. In particular, the amended alphabet (3) of p. 94 is 
still correct, if one were to repeat the proof, since a Boolean abstraction contains none 
of the symbols that cannot occur in a Boolean formula (such as =, V, 27/3, ¢’, etc.). 
Corollary 3.3.1 is important for the user, and we will continue to rely on it heavily. 


Very Important! Although we have 3.2.1 in predicate calculus, we must not expect 
to have the very same soundness result of 3.1.3. In fact, note that} 2 = x (Defini- 
tions 4.2.1 and 4.2.7), but /iauw2 = x. Indeed, the Boolean abstraction of “a = x” is 
just some Boolean variable, say, p. A Boolean variable, of course, is not a tautology. 


So, is predicate logic not sound? 


In fact, it is. That is, its rules preserve truth and its axioms are true. However, the 
truth concept in predicate calculus is narrower than that of tautology and tautological 
implication, as is to be expected: Tautologies (and tautological implications) speak 
of the broadest possible, most abstract concept of truth, one that is oblivious to 
the presence of (object) variables, constants, predicates, functions, quantifiers, and 
equality (between objects) and therefore cannot describe mathematical truth in its 
full variety. 

A different definition of truth (and “preservation of truth”—so-called logical 
implication) is necessary for predicate calculus. More on this in the chapter on @ 
first-order semantics (Chapter 8). 


4.3 ADDITIONAL EXERCISES 


1. In the formula 
(vz) ((v2)(¥y)2 <yVa> z) = (Wy)y = 2 


find to which “(Vz)”, if any, each occurrence of x belongs. Here “<” and “>” 
are just some nonlogical symbols (predicates of arity 2) of the alphabet. 


2. Consider the following (not fully parenthesized) formula, in the first-order lan- 
guage of arithmetic where 0 is a nonlogical constant symbol: 


(Vz)z2 = OV —(Vr)z = 0 (1) 
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(a) Identify all the prime subformulae of the above, and display them by boxing 
as in 4.1.25. 


(b) Using boxing indicate the Boolean abstraction of the formula. 
(c) Can you prove formula (1) in predicate logic without the benefit of any 
nonlogical axioms that speak of “0”? 


3. Find the results of the following substitutions. For item (a), work it out according 
to the inductive definition 4.1.28 step by step; for the others, just give the answer 
and explain the reasons if the result is undefined: 


(a) g( f(z), f(y))[x := 7] (where f is a unary function symbol, g is a binary 
function symbol, and 7 is a nonlogical constant symbol) 


(b) (f(z) < 7)[x := 7] (as above, plus we have a nonlogical binary predicate 
symbol <, in connection with which we are using infix notation) 


(c) ((Vx)(f(x) < 7))[x := 7] 


(d) ((Vy)(f(z) < 7))[x := 7] 
: ((Vx)(Vy)(F(7) < g(x, y)))[z = 9(y,7)| 
f) ((Vx)(Vy)(F(z) < 9(z,y)))[z = gy, 7) 
: ((Wx)(Vy)(Vz)(f(z) < 9(z,y)))[z := 9(y, 7)] 


4. Consider the language (i.e., set of strings) over alphabet 4.1.2 that is defined as 
follows: 


e A P-formula-caiculation consists of a finite sequence of steps. In each step 
we may write: 
(a) An atomic formula of 4.1.8 
(b) A formula of the form ((Vx)A) for any A in the WFF of 4.1.13 
(c) (A), provided A has already been written 
(d) (Ao B)—for any o € {A, V, >, =}provided A and B have already 
been written 
e We define a P-formula as any string that appears in a P-formula-calculation. 
e We define the complexity of a P-formula as the total number of connectives 
(counting repetitions) from {7, A, V, >, =} that appear in the formula but 
not in any subformula of type (b). 


Prove that the set of all P-formulae over the alphabet of 4.1.2 is the same as the 
set WEF of 4.1.13 over the same alphabet. 


Hint. First, verify that the definition of P-formula-calculation differs from 4.1.10 
in one essential aspect. Then, by induction on the complexity of P-formulae, prove 
that every such formula is in WFF. By induction on the complexity of formulae 
of WFF, prove that every such formula is a P-formula. 
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5. Prove—by induction on terms—that for any terms ¢ and s, if s is a prefix of t, then 
the strings ¢ and s must be identical. 


6. Prove that any nonempty proper prefix of a first-order formula must have an excess 
of left brackets. 


7. Prove the unique readability of first-order formulae; that is, for every such formula 
its immediate predecessors are uniquely defined. 


8. Is (Vx)(Vy)xz = y - (Vy)y = y an instance of Ax2? Why? 


9. Give a proof of F (Vx)(Vy)z = y — (Vy)y = y. 


CHAPTER 5 


TWO EQUIVALENT LOGICS 


This brief chapter aims to develop a few more useful tools that we will employ, 
among others, in the proof of the crucial 6.1.1. 

Recall Remark 2.1.17. In exactly the same manner, two different foundations 
of predicate logic over the same first-order language—two “different logics”—are 
equivalent iff they have the same absolute theorems. 


With half a minute’s reflection, and using the deduction theorem, we see that equiva- 
lent logics also have the same “relative theorems”. That is, fixing a set of assumptions 
T, one logic proves a formula A from the assumptions I° iff the other does. 


Pause. How could we have two logics over the same language? 


Well, just think of the ingredients: We can choose different logical axioms, or 
different primary rules of inference, or both! 


Here are the two logics over the language of Section 4.1 that we want to compare: 

(1) Our logic as given by 4.2.1 and 4.2.4 (cf. [50, 51]) 

(2) The logic given exactly as in (1) above, except that its only primary rule is MP 
(cf. [13], although in Enderton’s exposition the Boolean variables and constants are 
absent) 
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We note that 2.1.1, 2.1.4 and 2.1.6 depend only on the general concept of proof and 
not on the choice of logical axioms or rules of inference. Thus 
Both logics above, (1) and (2), satisfy 2.1.1, 2.1.4 and 2.1.6. 


5.0.1 Lemma. Post’s theorem (3.2.1) holds in logic (2) for finite T. 
Proof. Indeed, let 


Aj, Ag, A3,---;An Fran B (i) 
We will show 
A, Ag, A3,--., An - B (it) 
By 1.3.14(2), (i) yields 
Frat Ar - Ap ~ Az... A, ~ B (iit) 


Thus A; > Ay - A3 ~... — A, — B is an axiom of logic (2). We can there- 
fore write the following Hilbert proof—in the logic (2)—of B from the hypotheses 
A, Ao, Ag, . -)An: 


(1) Ay (hypothesis) 
(2) Ae (hypothesis) 
(3) Ag (hypothesis) 
(rn) An (hypothesis) 
(n+1) Ai — Ag ~ A3 >... A, — B (axiom) 
(n+2) Ag—7A3—->...7A,-B ((l,n+1)+ MP) 
(n+ 3) Ag—>...7A, ~B ((2,n + 2) + MP) 
(n+4) Agr... 7A, -~B {(3,n + 3) + MP) 
(nt+n) A, ~B ((n-—1,n+n-1)+ MP) 
(ntn+1) B ((n,n+n)+ MP) O 


5.0.2 Lemma. Logic (2) has BL and Eqn as derived rules of inference. 


Proof. Indeed, we know from Part I (3.1.1) that A, A = B Fay Band A= B Frau 
Cp := A] = Clp := Bl], where in the latter case A, B and C are abstractions of 
first-order formulae. 

By the previous lemma we can replace Fay: by |; that is, BL and Eqn are derived 
rules of logic (2). EJ 


5.0.3 Metatheorem. Logics (/} and (2) are equivalent. 
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Proof. The two have the same axioms. Now, the axioms plus the rules BL and Eqn 
of the first can simulate the rule MP of the second. Conversely, the axioms plus the 
rule MP of the second can simulate the rules BL and Eqn of the first (5.0.2). oO 

Why did we bother to prove the above metatheorem? How is it useful? Because 
the metatheory of logic (2) is simpler, i.e., logic (2) is easier to talk and argue about. 
This was the main reason. We can gauge its usefulness in Chapter 6 where we prove 
the generalization (derived) rule. 

An immediate taste of the facilitation provided by 5.0.3 is offered by the next 
exercise. 


5.0.4 Exercise. Invoking 5.0.3—and thus arguing about logic (2) rather than logic (1) 
—give an easy, short, and 2.6.2-independent proof of the deduction theorem (exactly 
as stated in 2.6.1) for first-order logic. Needless to say, this will not be circular: 
We did not use the (first-order) deduction theorem toward any of the results of this 
chapter. Oo 
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CHAPTER 6 


GENERALIZATION AND ADDITIONAL 
LEIBNIZ RULES 


In this chapter we finally bring quantifiers, in particular ““V”, to the fore; thus we 
really start doing predicate logic! 


By predicate logic we understand either logic (1) or (2) of the previous chapter. It 
does not matter which one, as they are equivalent! (5.0.3). © 
We will be sure to use logic (2) when we prove facts about logic. 


6.1 INSERTING AND REMOVING “(Vx)” 


6.1.1 Metatheorem. (Weak Generalization) Let [ + A and let moreover x not 
occur free in any formula found in the set. ThenT + (Vx) A as weil. 


Proof. The (meta)proof is, essentially, a construction that takes a [-proof of A in 
logic (2) and transforms it step by step into a valid -proof of (Vx) A—still in logic (2). 
The meta-tool employed is once again induction on natural numbers, indeed, 
induction on the length of a ’-proof that includes A. 
Basis (shortest possible proof). A occurs in a proof of length 1. Then A is one of 
the following: 
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(1) Axiom. Then—since all partial generalizations of all formulae found in groups 
Ax1—Axé6 are also axioms—(Vx) A is also an axiom. But then (cf. 4.2.6 and 4.2.7) 
(Vx)A is a -theorem. 


(2) In’. By hypothesis, A has no free x occurrences. Thus A — (Vx) A is an axiom 
(group Ax4). Since "+ A and + A — (Vx)A (cf. 4.2.6 and 4.2.7), we have 
[+ (Vx)A by an application of MP. 


We now take as I.H. the truth of the claim for any A that appears in a proof of 
length n or less. 
Let us establish the case where A appears in a proof of length n + 1. We have 


three cases: 


(i) A is not the last formula of the proof. Then we are done by the I.H. 


(ii) A is the last formula in the proof, but is an axiom or in. Then we are done by 
the Basis argument. 


(iii) None of the above. So A was obtained by MP, that is, B occurs in the proof, 
and so does B — A for some formula B. Since both B and B — A occur in 
[’-proofs of lengths n or less, the I.H. kicks in and we have 


[+ (Vx)B (*) 
and 
[+ (Vx)(B — A) (**) 
Now 
[+ (¥x)(B — A) — (Vx)B - (Vx)A (* * +) 


by Ax3 and 4.2.6 and 4.2.7, thus, by MP (twice)—from (+), (+*), and (+ * #)— 
T+ (Vx)A. | 


(1) The proof we just gave justifies the—at first sight mysterious—requirement 
that along with the formulae in the groups Axl—Ax6 we include all their partial 
generalizations, too, as logical axioms. This was used in step | of the basis. 

(2) The above metatheorem was proved under a premise that at first sight might appear 
awfully restrictive: “No formula in I has a free x.” Surely this restriction is close to 
impossible to meet—rendering the metatheorem “mostly inapplicable”—-when one 
has infinitely many formulae in. How are we to be sure that some particular variable 
is not free in every single one of them?! It is irrelevant; see below. 


108There are significant practical cases where I’ is infinite; e.g., Peano arithmetic and Zermelo/Fraenkel 
set theory have infinitely many nonlogical axioms. In the case of these two theories (cf. p. 116) we have 
some other tricks that neutralize the problem of an infinite [. Although infinitely many, we can choose 
the nontogical axioms so that they do not have free variables at all. 
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6.1.2 Corollary. If there is a proof of A fromT, where all the formulae from T used 
in the proof have no free x, thenT | (Wx)A. 


Proof. Let B,,..., By, be all the formulae from I used in the proof. 

(1) By assumption, none of them has x free. 

(2) By 4.2.6 and 4.2.7, B,,...,B, FA. 

By (1) and (2) and Metatheorem 6.1.1, Bi,...,B, - (Vx)A. By 2.1.1, T F 
(Vx) A as well. O 


Thus, without loss of generality, we can always pretend that the I’ in 6.1.1 is finite, 
for even if it is not, in every single proof only a finite part of it is used, anyway. 


6.1.3 Corollary. /ft A, then (Vx) A. 


Proof. This is immediate by taking T = @. Surely, you can’t find an A in this I’ with 
a free x in it! O 


6.1.4 Remark. Two important remarks need to be made here: 

(1) Metatheorem 6.1.1, and in particular Corollary 6.1.2, allow us to insert the 
formula (Vx)A at any point after A was written in a I’-proof as long as whatever 
formulae of I were invoked in the proof contain no free x. 

This is all right simply because in a I’-proof we can insert any I’-theorem we 
happen to know of—cf. 2.1.6. We do know that (Vx) A is a I’-theorem as soon as we 
learn that A is! (6.1.1) 

Similarly, the metatheorem’s second corollary, 6.1.3, allows us to insert (Vx)A 
in any proof, as long as we already know that A is an absolute theorem (this may 
have been established by the proof we are working on, or by some previous proof—a 
lemma, for example). 

(2) One must be careful not to confuse the derived rule “if + A, then (Vx) A” 
with “A - (Vx)A”. The former rule, weak generalization, imposes a constraint on 
the premise A in its use: Once we have A we may write down (Vx), but this step 
is constrained by how A was obtained. A cannot be arbitrary; rather it must be an 
absolute theorem. 

The latter rule, called strong or unconstrained generalization, allows us to write 
(Vx)A once we have A (written down, that is) regardless of how A was obtained. 
There are no conditions! A could have very well been an arbitrary hypothesis. 


Is this distinction between the two rules “real”? Yes! We just saw that our logic 
((1) or (2) on p. 151—it does not matter which) does allow weak generalization. 
We will see later on that it does not allow strong generalization. The reason is that 
if we had A + (Vx)A in our logic, then we would also have—by the deduction 
theorem—t A -> (Vx)A. Semantic considerations in Chapter 8 will show this to be 
impossible (cf. also the discussion of formula (5) on p. 143). O 


A closely related and extremely useful derived rule is trivially derived from first 
principles: 
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6.1.5 Metatheorem. (Specialization Rule) (Vx) Al A[x := ¢] 


Proof. Of course, if the expression “A(x := ¢]” ends up being undefined, then we 
have nothing to prove. Here is a Hilbert-style proof: 


(1) (Wx)A (hypothesis) 
(2) (VWx)A— A[x :=¢] (Ax2) 
(3) A[x:= ¢] (1,2) + MP) 0D 


6.1.6 Corollary. (Vx)Al A 


Proof. Take t to be x in 6.1.5 (cf. 4.1.34). O 


Corollary 6.1.6 and Metatheorem 6.1.1 (or Corollary 6.1.3)—which we will refer 
to as spec and gen respectively in annotations—are a team that makes life extremely 
easy in Hilbert proofs when it comes to dealing with the quantifier V. 

Corollary 6.1.6 lets us unconditionally remove the leftmost quantifier in the course 
of a proof. This action uncovers whatever Boolean connectives were buried in the 
scope of the removed quantifier! so that we can—in the subsequent steps of the 
proof—apply the purely Boolean techniques of Part I. 

Before the end of the proof, if the targeted theorem requires us to do so, we can 
reintroduce the removed quantifier using 6.1.1 (or 6.1.3 as appropriate)—conditions 
for insertion, of course, apply. 


This comment may appear too general now, but it will gain complete clarity as we 
progress, presenting several examples where the comment is put to use. 
Here is the first such example: 


6.1.7 Theorem. (Distributivity of V over A) | (Vx)(A A B) = (Vx)A A (Vx) B 


Proof. We use a Ping-Pong argument (cf. 3.4.5) to prove (Vx)(AAB) — (Wx)AA 
(Vx) B and + (Vx)A A (Vx)B — (Vx)(A A B). Below we label “(—)” and “(—)” 
the two directions that we have to do: 


'y pothesis) 
(1) + spec (6.1.6) 
(2) Frau (3) + 3.3.1) 


(1) (Wx/(AAB)  (h 
(2) ( 
(3) ( 
(4) B {(2) + tautological implication) 
(5) ( 
(6) ( 
(7) ( 


AAB 


(3) + gen (6.1.1; Okay: Hypothesis has no free x)) 
(4) + gen (6.1.1; Okay: Hypothesis has no free x)) 
(5,6) + tautological implication) 


'09Recall that the Boolean abstraction of a formula of the form (Vz) A is a Boolean variable, say p. Thus 
all the Boolean connectives of A are invisible in the abstraction. 
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Note. Annotation of use of Post’s theorem (usually invoked in the form of 3.3.1) 
might take the form given in step (3), (2) taut (3)”, but the ones in steps (4) and (7) 
are equally acceptable. 


(<-) 
(1) (Vx)AA(Vx)B (hypothesis) 
(2) (Wx)A ((1) + tautological implication) 
(3) (Vx)B ((1) + tautological implication) 
(4) A ((2) + spec) 
(5) B ((3) + spec) 
(6) AAB ((4, 5) + tautological implication) 
(7) (Wx)(AA B) ((6) + gen (6.1.1); Okay: Line (1) has no freex) O 


Easy and natural, was it not? 


Worth Stating: We applied Post’s theorem in the proof above, and we will be 
increasingly doing so in future proofs. But how do we know when 


Aj,...,An Fut B (*) 


holds, in order to apply Post’s theorem—which allows us to state and use the derived 
rule A,,...,A, + B? 

This is a practical, not a theoretical question, and it has a simpte answer: Jn 
practice, (*) should be trivial to verify using semantic ideas, or trivial for us to “see” 
outright—as, e.g., certainly A= B Fry A > Bis. 

In any case where such verification is not outright obvious, (*) should be justified 
separately from—i.e., outside of—the proof where it is used. 

How? Semantically (taut) or syntactically (+)—it is all the same by Post’s 
theorem—use whichever approach is easier to you. 


6.1.8 Theorem. + (Vx)(Vy)A = (Vy)(Vx)A 


Proof. Another Ping-Pong argument: 


(>) 
(1) (Wx)(Vy)A (hypothesis) 
(2) (Vy)A ((1) + spec (6.1.6)) 
(3) A ((2) + spec) 
(4) (Vx)A ((3) + 6.1.1—line (1) has no free x) 
(5) (Vy)(Vx)A  ((4) + 6.1.1—line (1) has no free y) 
(-) 


The proof can be reversed, (5) to (1), with a straightforward change of annotation. 
Exercise! O 
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6.1.9 Metatheorem. (V-Monotonicity) [ff + A — B, thenT + (Vx)A — (Vx)B, 
provided that no formula in T has a free x. 


Proof. 


(1) A-B {theorem from I’; we start where its proof finished) 
(2) (Wx)(A — B) ((1) + gen (6.1.1)}—restriction on F' makes this okay) 
(3) (Wx)(A > B) > 

(Vx)A > (Vx)B (axiom) 
(4) (Vx)A— (Vx)B ((2,3)+ MP) oO 


Why “monotonicity”? 

Well, one may view “—” as an analogue of ““<”. You see, < satisfies “x < y iff 
max(z, y) = y” while — satisfies (Axl) “A —= B= AV B= B”. 

If you next think of “V” as analogous to “‘max’’—not a terribly far fetched sugges- 
tion, since many reasonable people think of t as | and f as 0 (certainly the designers 
of the languages PL/I and C do)— then you see my point. 


But then the metatheorem says that if A is “less than or equal to” B, then (Vx) A 
is “less than or equal to” (Vx) B. 

If you finally intuitively think of “(Vx)” as a “function” (in the algebraic sense) of 
the “argument” A, then the jargon “monotonicity of V” makes sense. 


So there. A silly comment, justifying a silly terminology. Jn annotations A-mon 
will stand for \/-monotonicity. 


6.1.10 Corollary. [f+ A > B, then (Wx)A — (Wx)B. 


Proof. Just take T = @. O 


6.1.11 Corollary. /f[ | A = B, then alsoT | (Wx)A = (Vx)B, as long as T has 
no formulae with free x. 


Proof. 


A=B 
A-B 


(1) (proved from I’; now continue the proof) 

(2) (1) Feaut (2) + 3.3.1) 

(3) BoA ((1) Fraut (3) + 3.3.1) 

(4) (Wx)A— (Wx)B (2) + A-mon (6.1.9)) 

(5) (Wx)B— (Vx)A (3) + A-mon (6.1.9)) 

(6) (Wx)A=(Vx)B  ((4,5) Kraut (6) + 3.3.1) | 


6.1.12 Corollary. [f+ A = B, then alsot (Vx)A = (Vx)B. 
Proof. Just take T = @. O 
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Corollaries 6.1.1! and 6.!.12 are forms of the Leibniz rule that, for the first time so 
far, allow us to replace a Boolean variable that occurs inside the scope of a quantifier, 
and moreover they allow us to do so not caring if capture occurs! 

For example, think of 6.1.12 as “ift A = B,then+ C[p\ A] = C[p\ B]—where 
C is (Vx)p”. 

Can we extend 6.1.11 and 6.1.12 to hold for any formula C’? You bet! We do so 
in the next section, but first let me make a few more comments on the generalization 
tule(s). 


6.1.13 Remark. (1) Is the name generalization for the rule described in 6.1.1 apt? 
Yes. The rule is used in everyday math practice, where to prove that “a formula A 
that depends on the variable z holds for all values of x” we do instead the following: 

We prove that “A holds for an arbitrary value of z—in other words, a value that we 
have made no special assumptions about’. We then generalize and conclude—since 
the value of x that we used was unbiased—that A holds for all values of x. 

Which part in 6.1.1! formally says that when we proved A we did not take into 
account any special value of x? It is the part that says that our assumptions (T) do 
not even talk about x, that is, x is not referenced in them at all, because x is not free. 

(2) Is (Vx)A the “same as” the formula below? 


A(0) A A(1) A A(2) A A(3) A--- (i) 


where I am using here the shorthand A(k) for A[x := k]. 
No, for some obvious and for some more esoteric reasons. 
First the obvious: 


Obvious 1. 0,1, 2,... are nonlogical symbols. A logical expression such as (Vx) A 
that is meant to make sense for all applications of logic—not just for 
number theory and other theories that speak of the integers—has no 
business being defined by (or being “equivalent” to) an expression that 
refers to such nonlogical symbols. 


Obvious 2. Classica! logic, that is, the logic we develop and use in this volume— 
which, by the way, is the logic used by mathematicians, computer sci- 
entists, etc.—does not allow infinitely long formulae like (7). 


There are esoteric reasons according to which even if we remove the two objections 
above by cleverly tweaking question (2), the answer remains “no”. 

Okay, let us tweak the original question and ask instead: Would “(Vx) A” when its 
use is restricted to number theory mean (i) above? This bypasses objection | above 
by restricting the question. 

Let us next remove objection 2 above (Obvious 2) and rephrase, asking instead— 
without reference to infinitely long formulae—the intuitively equivalent question: 


Is it true in number theory that we have both (ii) and (721) below? 


(Vx)A- A(k), forall k > 0 (it) 


ee 
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A(0), A(1), A(2),... (Wx)A (iii) 


Well, we do have (27) by 6.1.5. However we do nor have (i277): A result of Kurt Gédel 
([16]) known as “‘the first incompleteness theorem” has as a corollary that there are 
formulae A of number theory for which all the premises in (iii) are theorems of 
number theory, but (Vx) A is not. O 


For the reader who will not easily let go: “But,” you say, “if the nonlogical axioms 
of (Peano) number theory characterize N and the various operations and relations on 
it (e.g., +, x, <) uniquely, then surely (iiz) ought to hold since the hypothesis part 
says that A is true of each natural number. Surely, (Vx)A says the same thing?” 

This is precisely where the problem lies! You see, it is known that first-order logic 
(over the language of Peano arithmetic and equipped with Peano’s axioms) is inca- 
pable of uniquely characterising N equipped with its operations and relations. That 
is, there are supersets of N that contain infinitely many other “‘numbers”—but where 
S,+, x, < and all standard operations on numbers still make sense. More concretely, 
these sets equipped with analogues of the operations and relations S,+, x, < also 
satisfy all the Peano axioms of number theory! 

Clearly, on such sets, saying that the left-hand side of F in (iz) holds is not enough 
guarantee that the right-hand side also holds, because the left-hand side of + does 
not say that A holds for all numbers (it says so only for the natural numbers, and is 
silent about the additional numbers). 


The team of spec and gen also allows us to work with “simultaneous substitution”: 
For example, if I have proved A that has x and y (and perhaps other variables that I 
do not care about) free, and if ¢ and s are any terms, can I then conclude that after I 
substitute ¢ and s into x and y simultaneously | will obtain a theorem? 


6.1.14 Example. First, I should be clear what this “simultaneous” means: Suppose 
that A is x = y. Let “t” be y and “s” be x. Then, 


(c= y)[v := t]ly:= slist =a 


while 
(z = y)ly = s]le = tlisy=y 

The result depends on the order of the two substitutions. 

On the other hand, if we drop ¢ and s into the x and y slots simultaneously, so that 
neither of the two substitutions has the time to mess up the other, then we get y = z. 

In effect, the “simultaneous substitution” is (or can be simulated by) a sequence 
of consecutive substitutions performed with the help of fresh variables: Pick two new 
variables, z and w. Then do 


(x = y) [x = 2] [y := ullz := tlw = 8] 


or 
(x = y)[z = 2][y = w]lw := s][z = 4] 


Both work, i.e., yield y = x. Verify! 


ee 
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In the general case (of two simultaneous substitutions)—that involves any formula 
A, any substitution slots x and y, and terms ¢ and s—the new (distinct) variables z 
and w are chosen to be fresh with respect to all of A, t, s. Then, a bit of reflection (or 
an induction on the complexity of A) shows that A[x := z][y := w][z := ¢][w := 5] 
and A[x := z][y := w][w := s][z := ¢] yield the same string,!''® corresponding to 
the intuitive understanding of “‘simultaneous substitution into x and y”. O 


6.1.15 Definition. (Simultaneous Substitution) The expression 


A[x1,..+, Xr = t,...,¢,] (1) 

denotes simultaneous substitution of the terms t,,... , ¢, into the variables x,,..., x, 
in the following sense: 

Let z1,...,2, be distinct new variables that do not occur at all (either as free or 


bound) in any of A, ¢,,...,¢,. Then (1) is short for 
Alxy := 21]... [Xp = Z,][Z1 := ty]... (2, = t] (2) 


where in the interest of generality our notation employed the metavariables x; and 
Zi. O 


This “simultaneity” that the definition is talking about is not in the physical 
time sense, but it rather means that effecting the substitutions sequentially, we are 
nevertheless guaranteed that none of the t; that have already replaced an x;—in two 
steps, via 2,—are subsequently altered by virtue of some t; being substituted into 
one of ¢;’s variables. This cannot happen since none of the ¢; contains any z;. 

In effect, we have—through 6.1.15—-simulated what we intuitively understand as 
“simultaneous substitution”: The substitution of the ¢; into the x; is order indepen- 
dent. 


6.1.16 Exercise. Given a formula A, if z is fresh, then A[x := 2] is defined 
(cf. 4.1.28). 
Hint. Induction on A. O 


6.1.17 Exercise. Suppose that A[x := ¢] is defined. 

If z is fresh, then A[x := z][z := ¢] is also defined, and has the same result as 
A[x := ¢]. 

Hint. Induction on A. D 


6.1.18 Exercise. Definition 6.1.15 yields the same string, unaffected by permutations 


within the groups “[x) :== z,]...[x, := z,]” and “[z, := ¢,]...[z, := €,]” in (2). 
The definition is also unaffected by the choice of fresh variables 21,..., Zr. 
Hint. Induction on A. Oo 


"And so do Aly := w][x := 2][z := ¢][w := s] and Aly := w][x := 2][w := s][z := ¢]. 
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6.1.19 Metatheorem. (Substitution Theorem) if} A and t,,...,t; are any terms, 
then’ Alx),...,X-:=t1,...,t,] as well. 


Of course, if the substitution is not defined, then there is nothing to prove. 


Proof. By 6.1.15 we need to prove that if z),...,z, are fresh with respect to 
A, t,...,t,, then we have 
F Afx: := 2]... [xp = 2,][z1 = ti]... [Zr = t| 


The above follows by applying the metatheorem below 2r times: 
Ift A and A[x := ¢] is defined, then + A[x := ¢], where t is any term 


The above has a trivial proof (Hilbert style, but not written down rigidly): 
Ihave + (Vx) A by 6.1.3. An application of 6.1.5 yields + A[x := ¢]. Oo 


6.1.20 Corollary. iff + A and there is a proof of this fact that uses formulae from 
T' that have no free x,,...,X,, then’ Alxy,...,x- :=t1,...,t,| as well. 


Proof. Let us fix attention to one such proof that uses hypotheses from I’ where 


none of x1,...,x, occur free. Say, B,,..., Bm are the hypotheses used. Thus, by 
definition of proof (4.2.6) 


By,...,Bmt A (1) 
Applying the deduction theorem to (1) m times, we get 
FB7>...7B, oA (2) 
By 6.1.19 we get 
F (B: >... 9 Bao A) bx: = 2]... [Xp 2= 2] [a1 = ti]... [z, = ty] (3) 
where the z,; are fresh. 
Since none of the x; or z; appear free in the B;, (3) can be rewritten as (remember 
the priority of the “[x := ...]” notation!) 
FB >... By > Afx = 21]... [x- = 2,][ai = ti]... [ze = t-] (4) 
Applying MP to (4), m times, yields 
By,...,Bmb Afxi = a1]... [xr = 2] [zi = th]... [Ze = ty] 


The above yields what we want, by hypothesis strengthening (2.1.1): 


[Tb Apxy := 21]...[x, = 2-][zi = ti]... [zr = 6] O 
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6.2 LEIBNIZ RULES THAT AFFECT QUANTIFIER SCOPES 


6.2.1 Metatheorem. (Weak Leibniz—“WL”) If + A = B, then+ Cp \ A] = 
Clp \ Bl. 


Proof. Knowing that the result holds in the “simple case” of 6.1.12—cf. also the 
remarks following the proof of 6.1.12—we are motivated to provide a proof by 
induction on the complexity of C. 


Basis. In the case of formula complexity 0 we have two subcases of interest: 


(1) Cis p. Then we must show “if A = B, thent A = B”. There is nothing to 
do! 


(2) C is not p, i.e., it is one of the following: q ( # p), t = s, O(f1,...,tn), T, 
.. Then we must show “if A = B, thent+ C = C”. Since + C = C holds 
anyway (Ax1), the if part is redundant and we are done. 


The complex cases: 


(i) Cis ~D. By LH. D[p \ A] = D[p \ BJ; hence + ~D[p \ A] = ~DIp \ B] 
by tautological implication (3.3.1) and thus + (=D}[p \ A] = (-D)[p \ B] 
(cf. 4.1.29). 


(ii) C is Do E, where o € {A,V,—>,=}. By LH. + D[p \ A] = D[p \ B] and 
+ E[p\ A] = E[p\ B];hencet D[p \ Ajo E[p\ A] = D[p\ B]oE|p\ B] by 
tautological implication and thust (Do E)[p\ A] = (Do E)|p\ B] by 4.1.29. 


(iii) C is (Vx) D. This is “the interesting case”. 
By LH.+ D[p \ A] = D[p \ B]. By 6.1.12, + (Vx)D[p \ A] = (Vx)DIp \ 
B], which Definition 4.1.29 allows us to rewrite as + ((Vx)D)[p \ A] = 
((Vx)D) |p \ B] 0 


In [51] [called the WL-rule “WLUS” (Weak Leibniz with unconditional substitution). 
I now prefer the simpler “WL”. 

But why “weak’’? Because unlike BL, which allows us to apply it regardless of 
where, or how, we got the hypothesis A = B, we will not apply WL unless we know 
that the hypothesis is an absolute theorem. 

Bad things will happen if we ignore this restriction: We can end up contradicting 
things we know. Ignoring the restriction means that no matter why we were allowed 
to write down A= B ina proof, we may next write, for any C, C[p \ A] = C[p \ B]. 

In other words, that 


A=BtTC[p\A]=Clp\ 8B] (i) 


is a derived rule. 
But if that is so, then I can also derive the “rule” below 


AF (Wx)A (ii) 
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which I know is impossible in our logic (cf. 6.1.4). 

Here is how to get (iz) from (2): First note that} (Vx)T = T. Indeed, a trivial 
Ping-Pong argument (cf. 3.4.5) suffices, noting that (Vx)T — T is in Ax2 while—as 
T has no free x—T — (Vx)T is in Ax4. Secondly, 


(1) A (hypothesis) 

(2) A=T ((1) + tautological implication) 

(3) (Wx)A= (Vx)T ((2) + mule (2); “C-part” is (Vx)p) 

(4) (VWx)A=T ((3) + (Wx)T = T + tautological implication) 

(5) (Wx)A ((4) + tautological implication) © 


WL is perhaps the most useful Leibniz rule in predicate logic, as it allows total 
freedom in substituting “equivalents for equivalents”. The price to pay for this 
freedom is, of course, the restriction on how the premise is obtained. 

By analyzing the proof of WL one readily sees that we can be a bit less restrictive 
in the assumption, as follows: 


6.2.2 Corollary. (A More Generous WL) [ff + A = B and if none of the bound 
variables of C occur free in the formulae of T, then’ + C[p \ A] = C[p \ B] as 
well. 


Proof. The proof is exactly the same as that of WL on p. 165. We change all “+” there 
by “I +” here and note that there are no changes in the Basis part. All the induction 
step cases regarding Boolean connectives are handled by tautological implication 
once more. 

The interesting case is when C is once again (Vx)D. The I.H. here yields 
[+ D[p\ A] = D[p \ B]. Now the assumption guarantees that -formulae have no 
free x, so 6.1.11 applies: 


IF (vx)D[p \ A] = (vx)DI[p \ B] 


In view of 4.1.29, this is what we want. O 
In practice, one finds 6.2.1 used more often than 6.2.2. 


We next strengthen BL. This is a “strong” Leibniz in that the hypothesis A = B 
can be plucked out of the blue without any restriction on its origin. On the other hand, 


we have an annoying side condition that p must not be in the scope of a quantifier of 
C. 


Well, we can drop the side condition from BL, but we have to continue using 
“conditional substitution” (4.1.30) as the discussion following the proof of 6.2.1 
made clear. 


6.2.3 Metatheorem. (Strong Leibniz—“SL”) A = B+ C[p := A] =C[p:= B] 


Again, it goes without saying that if the right hand side of + is undefined then 
we have nothing to prove as the expression “C[p := A] = C[p := B]” does not 
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denote any formula. I will not remind us again of this understanding of use of the 
metanotations “[p := A]” and “[x := t]” since ithas been already established enough 
times. 


Proof. The proof is by induction on the complexity of C. It is similar, but not 
identical, to the proof of WL (6.2.1). 


Basis. In the case of formula complexity 0 we have two subcases of interest: 
(1) Cis p. Then we must show A = Bk A = B, which holds by 1.4.11. 


(2) C is not p, ie., it is one of the following: q ( # p), t = s, O(ti,...,tn), T, 
.L. Then we must show A = B+ C =C-. This follows from} C = C (Axl) 
and 2.1.1. 


The complex cases: 


(i) Cis aD. By LH. A = Bt D[p x= A] = Dip := BJ; hence A = 
Bt -=Di/p := Aj] = —D[p := B] by tautological implication, and thus 
A= Bt (-D)[p := A] = (=D)[p := B] (cf. 4.1.30). 


(ii) Cis DoE. By LH. A= Bt D[p:= A] = D[p := Bl and A= Br 
E|p := A] = Elp := B]; hence A = Bt D[p:= Ajo El[p:= A] = D[p:= 
B] o E|p := B] by tautological implication and thus A = Bt (Do E)[p := 
A] = (Do E)|p := B] by 4.1.30. 


(iii) C is (Vx)D. This is “the interesting case”. By LH. A= Br Dip := A] = 
D\p := B]. Since C[p := A] and C[p := B] are defined, Definition 4.1.30 
implies that x is not free in either A or B. Therefore (cf. 4.1.21) it is not free 
in A = B. By 6.1.11, A= Bk (Vx)D[p := A] = (Vx)D[p := B], which 
4.1.30 allows us to rewrite as A = B+ ((Vx)D)[p := A] = ((Vx)D)[p := 
Bl. Oo 


In [51] I called this rule “SLCS” (Strong Leibniz with conditional substitution). | 
now prefer the simpler “SL”. 

Note that we may from now on, conveniently, forget BL and—as far as rules of 
type Leibniz are concerned—to use only WL and SL since the latter subsumes BL. 
This last observation justifies the concluding sentence in Remark 4.2.5. 


6.2.4 Corollary. D > (A= B)+ D—=(C|p:= A] = C[p:= B)) 
Proof. We use the deduction theorem to prove instead 
D-(A=B),Dt Clp:= A] =Cl[p:= B] 


Indeed 


(1) D (hypothesis) 
(2) D—-(A=B) (hypothesis) 
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(3) A=B ((1, 2) + MP) 
(4) Clp:= A]J=C[p:=B] ((3)+SL) 


Note how SL (rather than WL) is applicable, because line (3) does not necessarily 
contain an absolute theorem. Oo 


6.2.5 Remark. (1) The previous proof is not circular, as one might carelessly assume 
by the proved statement’s similarity to 2.6.2. Indeed, Exercise 5.0.4 promises— 
within predicate logic—a direct, short and easy, and oblivious to 2.6.2, proof of the 
deduction theorem. 

Note as well that 6.2.4 extends 2.6.2 since the latter can deal only with “p” that 
are not in the scope of some quantifier that occurs in C (Boolean logic knows nothing 
about such scopes). 

(2) An alternative proof of 6.2.4 extends the statement to 


Do(A=B)t Do (Clp = A] = Cp := B)) (+) 
where o is any of V, A, =: Note that SL and the deduction theorem yield 
+ (A= B) > (C[p:= A] = C[p:= B}) 
By tautological implication we get 
+ Do(A=B) > Do(C[p:= A] = C[p:= B)) 
By MP we get (x). Oo 


6.3 THE LEIBNIZ RULES “8.12” 


This section is meant as no more than extra homework since we will never use the 
two tools that we discuss here. 

The tag “8.12” in the section title refers to the one used for the “twin” rules 
Leibniz in [17], p.148. These, reproduced verbatim from loc. cit. but recast in 
standard quantifier notation and using the metavariables x and p, are 


A=B 


(¥x)(Cp == A] > D) = (We)(Clp = B] > D) (8.12a) 


and 
7 (A= B) 
(Vvx)(D > Clp ape (¥x)(D > C[p := B)) 


This section will prove valid forms of these two rules. 


(8.12b) 


6.3.1 Remark. (1) Refer to 4.1.18 once more if you plan to go to the source [17]. 
(2) The rules (8.12a) and (8.12b) are axiomatically given in [17] as the primary 
tules for quantifiers. As we will see here this is totally unnecessary for us to imitate, 
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because—with a correction—these are provable rules, that is, they are derived rules, 
just like “gen”, “WL” and “SL” are in our logic. 
(3) Are there any constraints on the premises of these rules? If so, what might 


they be? 
Indeed, there must be constraints; otherwise either rule can derive “A | (Vx)A”, 
which we know is unprovable in our logic. D 


6.3.2 Exercise. (I) Show that unless the premise in (8.12a) has restrictions on how 
it is obtained, (8.12a) implies strong generalization, and is therefore invalid. 

Hint. Experiment, taking B to be the formula T, D to be the formula |, and C to 
be the formula —p. 


(1D Show that unless the premise in (8.12b) has restrictions on how it is obtained, 
(8.12b) implies strong generalization, and is therefore invalid. 

Hint. Experiment, taking B to be the formula T, D to be the formula T, and C to 
be the formula p. 


Using the hints, you will produce, in each question (I) and (II), a Hilbert-style 
proof 


(1) A (hypothesis) 


(n) (¥x)A (some appropriate reason) D 
6.3.3 Metatheorem. (The Valid “8.12a’”) 1f[. | A = B, thenT + (Vx)(C[p := 
A] — D) = (Vx)(C[p := B] > D), provided no formula of T has a free x. 


Proof. 


(1) A=B (from I’; we now continue the proof) 
(2) (C[p:= A] D)= 
(Clp := B] > D) ((1) and SL + 3.3.1 to add “> D”) 
(3) (Wx)(C[p := A] — D) = 
(Vx)(C|[p := B] - D) ((2) and 6.1.11; restriction on T used) QO 


Note how in step (2) above SL—unlike WL—affords us the luxury not to worry 
whether A = B is an absolute theorem. Here, in general, it is not. 


6.3.4 Corollary. [f+ A = B, then (Vx)(C[p := A] > D) = (Vx)(C[p := 
B] + D). 


Proof. Take T = 9. Oo 


6.3.5 Metatheorem. (The Valid “8.12b”) If [+ D > (A = B), thenT + (Wx)(D 
> Clp := A]) = (Vx)(D — C|p := B)), provided no formula of T has a free x. 


ee 
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Proof. The hypothesis yields—via 6.2.4 followed by 3.3.1: 
Tb (D > C[p = Al) = (D> Cp = B)) 
We are done by an application of 6.1.11. O 


6.3.6 Corollary. [f+ D — (A = B), thent (Vx)(D — C[p := Al) = (Vx)(D > 
C|p := B)). 


Proof. Take T = @. | 


6.4 ADDITIONAL USEFUL TOOLS 


The authors of [17] introduce several theorem-schemata for quantifiers in their Chap- 
ter 8, which nevertheless they present as “axioms” and offer without proof. We will 
prove in this section that all these “axioms” are actually provable in our setting. All 
are useful additions to our toolbox and embody standard techniques of predicate 
logic such as the “variant theorem” (also known as the dummy renaming theorem) 
used to rename bound variables, and the so-called prenex operations used to bubble 
quantifiers from the interior of a formula all the way to its left boundary (cf. for 
example, 6.4.1, 6.4.2, 6.4.3). 


The inquisitive reader who will want to explore the detailed presentation of these 
metatheorems in the cited source will benefit from a word of caution: First, all these 
so-called axioms about “generalized quantifiers” are provable from first principles. 
Second, some of them—those that speak about the “other” quantifiers, )~ and [], 
that is, the generalized sum and product (of natural numbers)—do not even belong 
to pure logic but belong to number theory.!!! 

Thus the “unified” notation “+” intended by [17] to express properties of all 
“quantifiers”, 4, V, >>, [|—within pure logic—can do so only for 5 and V. 

The why is straightforward: In predicate logic—when we are not employing it 
within a specific application such as number theory—statements involving 5~, [] 
that also rely on special properties of these symbols have no place. 

First, statements involving properties of the nonlogical symbols such as >, [] 
cannot be logical axioms, nor can they be absolute theorems, by definition. 

Second, all these statements regarding these two (>_,[[) symbols happen to 
be provable, but this is achievable only after one has introduced the Peano axioms, 
which, incidentally, do not even refer to >, [] as these are not primitive symbols of the 
language of arithmetic! The symbols >> and [ ]—after a lot of work and acrobatics— 
can be defined in (Peano) number theory as secondary nonlogical symbols, and then 
all their properties as stated in Chapter 8 of [17] can be proved. However, I promise 
not to do any of this work here since this volume is about the methods and tools of 
nonapplied (i.e., pure) logic.'!7 


"Chapter 8 of [17] also conflates the “quantifiers” |) and () along with the others. The properties of 
these are provable in axiomatic set theory. 
'2The ambitious reader who wants to see how all this is done—in detail—may look up [53]. 


ee 
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Well, then, for the sake of exercise, but also to enrich further our logical toolbox, 
let us prove all these results in the balance of this section. 

It is fair to warn that in the context of predicate logic Hilbert-style proofs have the 
edge over equational-style proofs. The former are amenable to the methodology of 
removing/inserting quantifiers so that between the removal and insertion one can use 
Boolean techniques, notably the all-powerful Post’s theorem (3.2.1—usually in the 
form 3.3.1). 

The technique of removing/inserting quantifiers is not well suited to equational 
proofs partly because “A” and “(Vx)A” cannot be connected with “=”—indeed not 
even with “—” in one direction—in general. 

Of course, as always we will apply “the most natural proof style” for the problem 
at hand, but be aware that “most natural” is a subjective assessment! 


Before we embark on proofs of the results advertised at the beginning of this section, 
I would like to expand a bit on equational methodology. 
Our equational proofs so far have been based on a sequence of equivalences 


A, = Apo, Az = Az,..-, An-1 = Ap 


that we know are, each, [’-theorems (p. 62). We showed that we have 2.2.1 and its 
corollaries, which allow an equational proof to establish ! + A, = Ap, and thus 
[TE A, iff fF Ap. 

Our extension allows any one of the = symbols to be replaced by —. In this case, 
by Post’s theorem (3.2.1), we have just! A; — Aj. If we also know that + Aj, 
then MP shows that’ + A, as well. 

The layout is 


A, 

o (annotation) 
Ag 

o (annotation) 


An-1 
o (annotation) 
An 
o (annotation) 
An+1 
where “o” in each instance is—independently of the other instances—one of + or 


=>. The symbol = in this layout occurs only on the left margin, it is an alias for “—” 
but it is conjunctional. That is, 
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A 

=> (annotation) 
B 

=> (annotation) 
C 


means “A — B, B — C”—which are two separate formulae—not “A — B — C”. @ 
The latter is, of course, short for (A — (B — C)). 


We now embark on our task. 
6.4.1 Theorem. + (Vx)(A — B) = (A — (Vx)B), provided x is not free in A. 


Proof. We use a Ping-Pong argument (cf. 3.4.5) in conjunction with the deduction 
theorem. 
(—) I want 
F (¥x)(A > B) — (A > (Vx)B) 


but I'd rather (cf. 2.6.1) prove 
(vx)(A > B)F A- (Vx)B 
and, indeed, I’d rather (cf. 2.6.1 again!) prove 
(Vx)(A > B), AF (Vx)B 


Okay, let’s do it! (Never forget the discussion following 6.1.6 on p. 158.) 


(1) (¥x)(A— B) (hypothesis) 

(2) A (hypothesis) 

(3) A>B ((1) + spec (6.1.6)) 

(4) B ((2, 3) + MP) 

(5) (Vvx)B ((4) + gen (6.1.1); Okay since (1, 2) have no free x) 
(<—) I want 


F (A > (Vx)B) - (Vx)(A - B) 
I might as well do (cf. 2.6.1) the easier 
A — (Vx)BF (Wx)(A - B) (1) 


Seeing that A —> (Vx) B has no free x, I can prove the still easier (no quantifiers in 
right-hand side!) 
A- (Vx)BFA->B (2) 
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and then apply 6.1.1 to conclude. The deduction theorem allows me to prove some- 
thing even simpler: 
A-— (Vx)B,AF B (3) 


Here it goes (proof of (3)): 
(1) A-— (V¥x)B (hypothesis) 


(2) A (hypothesis) 
(3) (V¥x)B (C1, 2) + MP) 
(4) B ((3) + spec) oO 


We can also give an equational proof of the (—) direction—extended in this 
section to allow both the conjunctional = and => in the left margin—as follows: 


(—) 
(vx)(A — B) 
=> (Ax3) 
(Vx)A — (Vx)B 
<> (SL: “Numerator:” Ax4 + (Vx)A — A (Ping-Pong); “C-part” is p > (Vx)B) 
A — (Vx)B 


6.4.2 Corollary. | (Vx)(A V B) = AV (Vx)B, provided x is not free in A. 
Proof. (Equational) 


© (tautology, hence a theorem) 

AV (V¥x)B O 
Most of the results we prove here have interesting duals where V and 4 are inter- 
changed (and so are A and V). The first one we prove with a routine (equational) 
calculation, but we will leave the proof of the majority of these duals to the reader. 

Recall the definition of the informal (logical) symbol 3 (4.1.17): 
(4x) A is short for «(Vx)7A 

Now, F 7=(Vx)7A = —(Vx)7A (member of Axl). We may choose.to use the 
4-abbreviation in, say, the left-hand side of = to obtain 


b (ax)A = 7(Vx)7A 


We will frequently use this absolute theorem in what follows, often in connection @ 
with WL, and will nickname it “definition of 4”. 
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6.4.3 Corollary. | (4x)(A A B) = AA (Ax) B, provided x is not free in A. 


Proof. Below we use WL when capture may in general happen. Of course, in such 
a case, the hypothesis of the rule has to be an absolute theorem “F X = Y”. 


(Ax)(A A B) 
(definition of 5) 
a(Vx)7(A A B) 
(WL and deMorgan; “C-part” is =(Vx)p) 
7(Vx)(7A V 7B) 
<> (SL and 6.4.2—no free x in —A; “C-part” is >p) 
(3A V (Vx)-B) 
<> (deMorgan) 
A A 7(V¥x)>B 
© (tautology) 
AA 7(Vx)7=B 
© (SL and definition of 4; “C-part” is A A p) 
Af (Ax)B QO 


Here are the remaining results we promised (in quotes are the nicknames that the 
authors of [17] give to some of these results): 


1. “Empty range”. + (Vx)(L — A) = T. By redundant true we just prove 
F (Vx)(L — A). Since L — A (in Ax1) we are done by an application 
of 6.1.3. 


2. “One point rule”. Provided that x is not free in the term #, (Vx)(2 =t— A)= 
A[x := t]. We employ a Ping-Pong argument. 


(—) Note that since there is no free x in ¢, 


(x=t—A)[x:=t] is t=t— A[x:=7] 


Thus 
(1) (Wx)(2=t—> A) >t=t—Alx:=t] (Ax2) 
(2) (Wx)(x = x) (partial generalization of Ax5) 
(3) t=t ((2) + 6.1.5) 
(4) (Vx)(e=t— A) Ale := 4] (1, 3) + 3.3.1) 
(<-) 


(1) x=t>(A=Alx:=¢]) (Ax6) 
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(2) Afk:=tlox=t3A ((1) +3.3.1) 
(3) (Wx) A[x := t] — (Vx)(x =t— A) ((2) + 6.1.10—(2) is absolute) 
(4) Alx := t] > (Wx)(x =t > A) ((3) + Ax4 + 3.3.1) 


Note that Axd is applicable in step (4) since there is no free x in A[x := ¢] 
. “One point rule—3-version”. + (4x)(x = tA A) = A[x := ¢] if x is not free 
in é. 


Exercise! (Hint. Use an equational calculation and the V-version of the one point 
tule.) 


. “Distributivity of V over A”. - (Wx)(A — B) A (Wx)(A > C) = (Wx)(A > 
BAC). 


We calculate a proof as follows: 


(Wx)(A > B) A (Wx)(A > C) 

© (6.1. a 

(vx) ((A + B)A(A +) 

© (6.1.11 + tautology (Axl) (A — B)A(A > C) = (A> BAC)) 
(vx)(A — BAC) 


We could also invoke WL itself above, but 6.1.11 is simpler and invoking it 
involves writing less annotation. 


The metatheorem that we just proved generalizes 6. 1.7 to the case of the “bounded” 
or “relativized” quantifier V. The authors of (17] write the metatheorem thus 


(vx|A : B) A (¥x|A: C) = (Vx|A: BAC) 
while the corresponding notation in [2] is 
(vx) 4B A (Vx) 4B = (Vx) 4(B A C) 


The general notation for bounded quantification is not much in use in the literature. 
However, special cases are used, such as “(Vx)¢yA” and also “(Vx € y)A”, 
which mean (x)(x € y — A), and “(Vx)<yA” and also “(Vx < y)A” that 
mean (Vx)(x < y — A). 


. The proof of the dual of the above, “distributivity of 4 over V”, is left to the reader. 
It states: | (Ax)(A A B) V(Ax)(AAC)= (3x)(A A(BV C)). In the notation 
of [2] it is written as: + (Ax) 4B v (Ax) 4C = (Ax),4(BVC). 


. “Range split”, where range in [17] refers to the A of (Vx) 4B. 
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F (¥x)(A V B => C) = (Wx}(A — C) A (¥x)(B — C), or, in Bourbaki’s 
notation, (Vx) 4vaC = (Vx)AC A (Vx) BC. 


We calculate as follows: 


Vx)(A + C) A (¥x)(B > C) 
(6.1.7) 


( 

( 

(vx)((A + C) A (BC) 

© (6.1.11 and the tautology (A > C) A(B — C) = ((AV B) >C)) 
(vx) ((Av B)— c) 


7. “Interchange of dummies”—as bound variables are called in [17]. This generalizes 
6.1.8 to the case of bounded quantifiers. It states, (Vx)(A > (Vy)(B > C)) = 
(Vy)(B — (V¥x)(A — C)), on the condition that y is not free in A and x is not free 
in B. To highlight the relationship to 6.1.8 I also write the above in [2]-notation: 


F (Vx)4(Vy)BC = (Vy) (Vx) aC 
‘Let us now calculate: 


(vx)(A > (Vy)(B — C)) 
<> (6.1.11 + 6.4.1—no free y in A) 
(vx)(Vy)(A > (B > C)) 
(WL + obvious tautology; “C-part” (Vx)(Vy)p) (*) 
(vx)(Vy)(B — (A > C)) 
(6.1.8) 
(Vy)(Vx)(B— {A > C)) 
= (6.1.11 + 6.4.1—no free x in B) 
(Vy)(B — (Vx)(A > C)) 


Note. Step (*) uses WL rather than the simpler 6.1.11 since the latter deals with 
a single V up in front. 
8. The dual of the above is 


F (Ax)(A A (Ay)(B A C)) = (Ay)(B A (Ax)(A A C)), on the condition that y 
is not free in A and x is not free in B. 


It can be trivially proved using the “-definition” and an equational argument. 
Exercise! 


9, “Nesting”. | (Vx)(Vy)(A A B — C) = (VWx)(A — (Vy)(B — C)), on the 
condition that y is not free in A. 
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(Vx)(A — (Vy)(B > C)) 

© (6.1.11 + 6.4.1—no free y in A) 
(Vx)(Vy)(A > (B > C)) 

<> (WL + obvious tautology; “C-part” (Vx)(Vy)p) 
(Vx)(Vy)(A A BC) 


10. The -dual of the above is 


F (4x)(Sy)(A A BAC) = (Ax)(A A (Sy)(B A C)), on the condition that y is 
not free in A. 


It is easy to prove equationally. Exercise! 


The next metatheorem was given the nickname dummy renaming in [17] (where it is 
axiom (8.21)). Elsewhere in the literature (e.g., [45]), one refers to the metatheorem 
as the variant metatheorem.!"3 

By either nickname it simply states something that we expect at the intuitive level, 
something we would be prepared to shrug off by saying, referring to the bound 
variable, “What’s in a name?” After all, we know that 577_, 7? = S7p_, k?. 

Indeed, under some simple and not so restrictive conditions, a bound variable can 
be provably renamed without changing a formula’s provability. 


6.4.4 Theorem. (Dummy Renaming for V) [f z does not occur in A—i.e., neither 
free, nor bound—then | (Vx) A = (Vz) A[x := 2]. 


The practical usefulness of the above is when z does not occur in (Vx) A either, 
ie., when z # x as well, for if z and x are the same, then the theorem becomes the 
trivial tautology (Vx)A = (Vx)A (cf. 4.1.34). 

Proof. We go Ping-Pong: 

(—) 


(1) (¥x)A — A[x := 2] (Ax2: z is fresh for A; no capture: A[x := z] defined) 
(2) (Vz)(Wx)A > 


(Vz) A[x := 2] ((1) + A-mon (6.1.10)) 
(3) (Wx)A > 
(Vz) A[x := 2] (2) +h (Vx)A — (Vz)(V¥x)A from Ax4 + 3.3.1) 


"134 variant of (Vx)A is (Vz) A[x := z] for some fresh z. 
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(<) 


(1) (Wz)A[x := 2] 3 

A[x := 2](z := x] (Ax2—A|x := 2][z := x|-defined, cf. 4.1.35) 
(2) (W2)A[x := 2] A ((1) rewritten—by 4.1.35, A[x := 2|[z := x] is A) 
(3)  (¥x)(Vz)A[x := 2] > 


(vx)A ((2) + A-mon) 
(4) (W2)A[x := 2] 3 
(Vx)A (3) +b (Wz) A[x := 2] — (Vx)(Vz) A[x := 2] 


from Ax4; no free x in (Vz)A[x := 2]);+ 3.3.1) O 


6.4.5 Corollary. (Dummy Renaming for 3) [fz does not occur in A, thent (Ax)A 
= (Az) A[x := 2]. 


Proof. Exercise! 0 
6.4.6 Exercise. Show that (Vx)(Vx)A = (Vx)A. O 


6.4.7 Exercise. With two examples (within logic) show that the restriction according 
to which z in 6.4.4 must be neither free nor bound in A is necessary. O 


Equipped with the dummy renaming metatheorem we can stop worrying about “cap- 
ture”. For example, whenever we plan to do A[x := ¢] we can always settle for 
A’ |x := t] instead, where A’ is obtained from A by replacing all of the latter's bound 
variables by fresh ones (with respect to A,t). An induction on the complexity of 
A along with the pair WL + 6.4.4 yields | A = A’. This yields the metatheorem 
(cf. 6.1.19) “if A, then + A’[x := #], if A’ is chosen as above” (because under 
the circumstances, + A implies + A’ to begin with; on the other hand, A’|x := ¢| is 
defined). 


6.5 INSERTING AND REMOVING “(3x)” 


Inserting and removing (4x) is an analogous acrobatic to that of inserting and re- 
moving (‘’x), performed for the same reason: to reduce a proof, as much as possible, 
to one where Post’s theorem can be liberally applied. It is a technique that is very 
often used in everyday math, and is quite powerful. 

First, to insert 5 is rather trivial. We have the following tools: 


6.5.1 Theorem. (Dual of Ax2) | A[x := t] > (Ax)A 
Proof. 


A[x := t] — (4x)A 
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(SL + “i-def” (p. 173); “C-part’” is A[x := t] — p) 
A[x := t] > 7(Vx)7A 
(tautology) 
(Vx)7A > -A[x := ¢] oO 


6.5.2 Corollary. (Dual of the Specialization Rule) A[x := ¢] + (4x)A 
Proof. 6.5.1 and MP. Oo 


6.5.3 Corollary. Al (Ax)A 


Proof. By 6.5.2, taking t to be x. Oo 
Now let us tum to removing 4. We will need a few tools at first. 


6.5.4 Metatheorem. (V Introduction) /f x does not occur free either in T or in A, 
then’ A> Biff lt A- (Vx)B. 


Proof. Only if direction. We are assuming [+ A — B. 6.1.9 yields Pt (Vx)A > 
(Vx)B. By the condition on A, I have + A — (Vx)A (Ax4). I am done by 
tautological implication. 

Pause. This was a Hilbert proof written in free-style, just as a mathematician or 
computer scientist would have written it—not vertically, and not fully numbered. So 
is the following. 


If direction: A tautological implication of (Vx)B — B (Ax2) is 
+ (A > (Vx)B) — (A B) (1) 
Thus, if Ihave P+ A — (Vx)B, then (1) and MP yield P+ A — B. O 


6.5.5 Corollary. (4 Introduction) /f x does not occur free either in or in B, then 
TFKA— Biff (Aax)A- B. 


Note that the condition switched from A to B. 
Proof. If direction: k A — (Ax) A by 6.5.1. This yields 


Fk ((Ax)A - B) ed (A ed B) (2) 


by tautological implication. If we now assume [+ (4x)A — B, then (2) and MP 
yield P+ A B. 


Only if direction: 
(1) AB (I'-proved; we continue the proof) 
(2) =B—3-7A (C1) + taut. implication (3.3.1)) 
(3) =AB- (¥x)7=A_ ((2) + 6.5.4 conditions met) 
(4) 7=(¥x)7A— B  ((3) + taut. implication) 
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Line (4) really says “(Ax)A — B”. Oo 


Corollary 6.5.5 is really the ticket to the technique of removing (Ax): 


6.5.6 Metatheorem. (Auxiliary Variable Metatheorem) Assume that T+ (Ax)A. 
Moreover, assume that 1 + A[x := z| + B, where z is fresh with respect toT, (Ax) A 
and B. Then + Bas well. 


Proof. We can argue as follows: By the deduction theorem, ! + A[x := z] > B. 
Thus, from 6.5.5, "+ (az) A{x :=z] > B. 
We can now calculate equationally (from hypotheses I): 


(az) A[x := z] > B 
(SL + 6.4.5; z is fresh for A; “C-part” is p — B) 
(d4x)A — B 
(SL + 2.1.23; “C-part” is p > B) 
TO B 
= (tautology) 
B O 


(1) The seemingly weaker hypothesis that “z is fresh with respect to I’, A, and B” also 
lets the proof through as we immediately see from the first step (first <>). Obtaining 
the first line of the equational proof only requires z not to occur free in U { B}, 
However, note that 


e Under the weaker hypothesis, if z is x, then the theorem trivializes to 2.1.6, by 
the dual of “t+ A = (Vx) A when x not free in A” and 4.1.34. We learn nothing 
new. 


e Thus, the case of practical importance is when there is a nontrivial (4x)-prefix 
that the metatheorem teaches us how to “remove”; i.e., a prefix that actually 
binds some free x in A. But then z is not x if the former does not occur in 
A. Therefore, in the “interesting case”, the weaker restriction “z is fresh with 
respect to A” is the same as the one stated in the metatheorem: ‘“‘z is fresh with 
respect to (4x) A”. 


(2) Very Important! There is nothing cryptic about the metatheorem, which is 
actually used a lot by folks who write in mathematics, in computer science, and in 
other fields where mathematical reasoning is called for. The intuition behind it is 
this: 

If I know that (4x) A holds, then this tells me, intuitively, that for some (value of) 
x, A(x) holds.!!4 


‘l4Where, by “ A(x)” I simply want to draw attention, notationally, to A’s dependence on x. 
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So, even though I do not know (or care) which value makes A(x) hold, | can name 
z any such a value and say, “Okay then, let z be one of those values of x that make 
A(x) hold”; for short I assume A(z)! 

This additional—auxiliary—assumption helps a lot toward proving B: 

(a) Because more assumptions make proofs easier! 

(b) This assumption is potentially easier to manipulate with Boolean techniques 
than (4x) A is, because unlike the latter whose Boolean abstraction (4.1.25) is just a 
formula of the form ~p, A[x := z| (or A(z)) may have Boolean connectives that I 
can profitably use in conjunction with Post’s theorem. 

The metatheorem guarantees that once this auxiliary assumption, A[x := z], has 
served its purpose, it drops out of the picture, leaving us with just the fact "+ B. 

Intuitively, this is because the assumption A(z) is an alternative way to say 
“(4x) A”, this way: “Some unspecified but fixed value z of x makes A(x) hold”—so 
it does not really add anything new that I’ did not already know about. Recall that 
can prove (4x) A. 


Pause. How exactly does the metatheorem suggest the intuitive interpretation that 
z is fixed? By inserting “A[x := z]” in a proof of B as an auxiliary hypothesis. This 
hypothesis disallows us from using generalization (Vz) anywhere in the proof below 
the point of insertion—review 6.1.1. Therefore, z behaves like a constant, being 
unavailable for universal quantification (generalization)! 


This metatheorem is a mathematical phenomenon entirely analogous to what hap- 
pens with induction over N: There we want to prove #(n) from some assumptions 
I, and for arbitrary n. Out of the blue we add an assumption, that #(k) holds for 
all k < n, and use it in our proof. 

But when the dust settles we say that we have proved #(n) only fromT, not from 
T+ “#(k) holds for all k <n”. The additional assumption (1.H.) is gone! 


6.5.7 Corollary. Assume that + (Ax)A. Moreover, assume that A[x := 2] + B, 
where z is fresh with respect to (Ax) A and B. Then| B as well. 


Proof. Take T = 9. O 


6.5.8 Corollary. Assume that A(x := z] + B, where z is fresh with respect to (Ax)A 
and B. Then (Ax)At Bas well. 


Proof. Take T = {(Ax) A}. oO 


6.5.9 Remark. In the examples that follow toward illustrating the use of 6.5.6 and 
its corollaries, we eliminate the existential prefix from a formula (4x) A that occurs 
in a proof from [, say at line (nr), by introducing—any where below line (n)—the 
formula A[x := z] with the terse annotation auxiliary hypothesis associated with (n); 
z fresh. 

According to the requirements of 6.5.6, “z fresh” means all three below: 


(i) z does not occur in any hypotheses (auxiliary or not) written before this step. 


e 
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(ii) z does not occur in the already-written existential formula (4x) A (the “asso- 

ciate” of A{x := z]). 

(iii) z does not occur in the formula we want to prove. 
In practice, if we interpret “z fresh” more strongly, as (iii) plus it does not occur in 
any formula written in the proof before this step, then we are covered. O 
6.5.10 Example. We prove here, just for practice, thatt (4x)(Vy)A — (Vy) (4x) A. 

I'll give you two proofs: 


First proof: I use deduction theorem, so I prove (4x)(Vy)A - (Vy)(Ax)A 
instead: 


(1) (ax)(Vy)A (hypothesis) 

(2) (Vy)A[x:=z] (auxiliary hypothesis associated with (1); z fresh) 

(3) Alx := g] ((2) + spec) 

(4) (Ax)A ((3) + 6.5.2) 

(5) (Wy)(Sx)A {(4) + 6.1.1—lines (1,2) {hypotheses] have no free y) 


By 6.5.6 (or 6.5.8; [ = {(4x)(Vy) A}), the “auxiliary” hypothesis (2) drops out, and 
we have that line (1), alone, proves line (5). In effect, what the proof does is so 
obvious as to be dull: It removes quantifiers—using appropriate tools—and reinserts 
them in a different order. 


You must be sure to annotate the auxiliary hypothesis as such. It is dead wrong to @ 
say that line (2) follows from line (1). 


Second proof: From} A — (4x)A (i.e., 6.5.1) we get (Vy)A — (Vy)(Ax)A 
by 6.1.10. 
Corollary 6.5.5 yields k (Ax)(Vy)A — (Vy)(Ax)A. O 


6.5.11 Example. We prove that (4x)(A — B), (Vx)At (Ax)B. 


(1) (Ax)(A - B) (hypothesis) 

(2) (Wx)A (hypothesis) 

(3) Alx:=z]— Blx:=z] (aux. hypothesis associated with (1); z is fresh) 
(4) Alx:= 2] ((2) + 6.1.5) 

(5) Bix := al ((3, 4) + MP) 

(6) (ax)B ((5) + 6.5.2) 


Lines (1) + (2) prove (6) by 6.5.6. /’ll stop making this pedantic assertion at the end 
of proofs by auxiliary hypothesis/variable. We know that the auxiliary hypothesis 
drops out. No more reminders! O 
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6.5.12 Example. It is instructive to look at an alternative proof that uses proof by 
contradiction (cf. 2.6.7). So rather than establishing (Ax)(A — B), (Vx)AF (4x)B 
we will attempt 

(4x)(A > B), (Vx)A, -(4x)BE L 


instead: 

(1) (4x)(A— B) (hypothesis) 

(2) (Vx)A (hypothesis) 

(3) -7(4x)B (hypothesis) 

(4) —=4(Vx)-B ((3) + writing in full the abbreviation “4”) 

(5) (VvVx)-B ((4) + tautological implication (3.3.1) 

(6) A ((2) + 6.1.6) 

(7) -B ((5) + 6.1.6) 

(8) -=(A— B) ((6, 7) + tautological implication) 

(9) (Wx)-(A > B) (8) + gen; Okay since (1, 2, 3) have no free x) 
(10) -(4x)(A > B) ((9) + s-abbrev. + tautol. implication) 
(1) (C1, 10) + 2.5.7) O 


6.5.13 Example. We establish (Vx)(A > B), (Ax)A (Ax)B. 


(1) (Vx)(A > B) (hypothesis) 

(2) (Ax)A (hypothesis) 

(3) A[x := 2] (auxiliary hypothesis associated with (2); z fresh) 
(4) Alx:=2]} > Blx:=z] (1) + spec (6.1.5)) 

(5) Bix:=2| ((3, 4) + MP) 

(6) (ax)B ((5) + 6.5.2) o 


6.5.14 Remark. An experienced mathematician or computer scientist who never 
took a course in formal logic(!) would probably argue the above almost identically, 
however they would most likely opt for intuitively acceptable semantic terminology. 
They would also probably prefer to write A(x) and B(x)—over the terse A and B— 
to draw attention to our interest in the variable x. The argument would go something 
like this: 

Assume that (Wx)(A(x) > B(x)) and (4x)A(x) are true. The truth of the 
latter implies that for some value of x, say c, A(c) is true. The truth of the first 
assumption (for all the values of x) implies—in particular—that A(c) > B(c) 
is true. 

Modus ponens yields the truth of B(c). But then it is true to say (Ax) B. 


Our formal methods achieve the following: 
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(1) Ratify the above informal technique for handling 3. We have already noted 
that the “auxiliary variable” behaves formally like a constant throughout the proof, 
so it is all right that our fictitious computer scientist used an “auxiliary constant” 
c—cf. Pause on p. 181. 

(2) Make the technique available to the nonexpert—the experts can fend for 
themselves without the benefit of formal rules; the latter mostly benefit beginners. 

(3) Avoid errors that might creep into a loose semantic approach (see next exam- 
ple!). 

By the way, our fictitious mathematician wrote a Hilbert proof but did so in a 
rather condensed style, without numbering. This condensed informal Hilbert style 
is very common in mathematical practice. \f a proof is longer, then numbers are 
inserted only where needed so that one can later refer back to earlier statements. O 


6.5.15 Example. Let us prove the schemat (3x)A A (Ax) B — (3x)(AA B). 

We will do this posturing as “experienced computer scientists or mathematicians”. 
Thus, we will attempt to imitate the condensed Hilbert style of proof given above, 
using semantic terminology rather than formal (i.e., syntactic) methods: 

Assume that (3x)A(x) and (Sx) B(x)!!5 are true. Thus, for some value 

of x, say c, we have A(c) and B(c) are true. But then so is A(c) A B(c). 

Suppressing reference to the specific ¢, it is true to state (3x) (A(x) A B(x)). 


This is nice, crisp, and short. 


And wrong! We will see in Chapter 8 that (Ax)A A (Ax)B — (3x)(A A B) is 
not a theorem schema. 


What went wrong? Well, an inexperienced person arguing semantically often 
makes this kind of error (I have seen it over and over again): “Thus, for some value 
of x, say c, we have A(c) and B(c) are true.” Surely, the truth of (3x) A(x) and 
(3x) B(x) does not imply that the same c makes both A(c) and B(c) true! 

We should have said that some c and some d (possibly different) make A(c) and 
B(d) true. 

Note how we cannot now conclude (3x) (A(x) A B(x)) as we are saved by the 
“(possibly different)” qualification. 

Would formal techniques be safer? Yes! 


(1) (Ax)AA (Aax)B (hypothesis) 

(2) (Ax)A ((1) + tautological implication) 

(3) (Ax)B ((1) + tautological implication) 

(4) Alx := 2] {auxiliary hypothesis associated with (2); z fresh) 
(5) Bix := wl (auxiliary hypothesis associated with (3); w fresh) 


''5The experienced mathematician takes (at least} two things for granted and thus unworthy of explicit 
mention: 

(1) The deduction theorem 

(2) “Hypothesis splitting” (2.5.2), where a hypothesis X A Y splits into two hypotheses X and Y 


? 
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Thus, by the requirement for “freshness” (cf. Remark 6.5.9), the “auxiliary variables” 
z and w are distinct. The variable z is the formal counterpart of c above, while w 
formalizes d. 

Clearly, we cannot continue the formal argument in the fallacious way of our 
original informal argument. O 


6.5.16 Example. Our last example has a famous name attached to it: Bertrand 
Russell. We show that for any predicate ¢ of arity 2 (i.e., one that accepts two 
arguments) 


F >(4y)(Vx)(O(z, y) = 7$(x, 2) (R) 
By the tautology -A = A — _ it suffices that we show instead 


F (Jy)(Vx)(O(x, y) = A(x, 2)) > 1 


(1) (Ay)(Vx)(¢(x,y) = ~O(z,2)) (hypothesis) 

(2) (Vx)(@(a2, z) = d(x, x)) (aux. hypothesis for (1); z fresh) 
(3) (2, z) = 7G(z, 2) ((2) + spec) 

(4) L (G) + tautological implication) 


If we take ¢(x, y) to be “x € y”—where € is the “is a member of” predicate of set 
theory, then (2) that we just proved (for any predicate of arity 2) becomes 


F a(Sy)(Va)(2 € y= 2 ¢ x) (R’) 


Translated in English (R’) says that there is no set y that contains all those sets that 
satisfy xz ¢ x (are not members of themselves). 

Very remarkably, the nonexistence of such a set y—a situation known as “Russell’s 
paradox”!!©_has just been proved within pure logic without using a single set theory 
axiom! O 


There is a result analogous to the “auxiliary variable metatheorem” (6.5.6), named 
the auxiliary constant metatheorem, to which we now turn our attention. It makes 
formally explicit the role of the auxiliary variable as a “constant” (cf. Pause on 
p. 181). We will not need the auxiliary constant metatheorem except in the Appendix 
to Part II. 


6.5.17 Lemma. Assume that no formula inT contains the constant cand thatl + A. 
Moreover, let x be a variable that does not occur in A. ThenT + A’, where A’ is 
obtained from A by replacing all the occurrences of c in it by x. 


Proof. The proof, given by induction on the length of a proof of A from I for logic 2 
of Chapter 5 is very similar to that of 6.1.1. Throughout, the result of replacing cin 
a formula B by x will be denoted by B’. 


The “paradox” exists in Cantor’s informal—or as we say, naive—set theory that leads us to believe that 
every collection of objects is a set. ‘ 
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For proofs of length one we have two cases: 
(1) A eT. Then A’ is A since A cannot contain c. Thus A’. 


(2) A € A, (4.2.1). There are six subcases where A is a partial generalization of 
one of the following: 


(i) Tautology B (group Ax1). The tautology is determined by how the Boolean 
connectives connect Boolean variables, constants, and prime formulae in- 
side B. The former two remain invariant as c is replaced by x. The prime 
formulae are of three types, (Vy)C, ¢ = s, or 6(t1,...,£,). The replace- 
ment of c by by x leaves these types invariant, say, (Vy)C’, t! = s’, or 
f(t,,...,t1,). Thus we end up with a tautology B’. 


(ii) Formula (Vy)B — Bly := t] (group Ax2). Replacing c by x results in a 
formula still in group Ax2: (Vy) B’ > B'ly := t’]. Note that B’ly := ¢’] 
is still defined, for x cannot be captured (if it entered t’), being new for A. 


(iii) Formula (Vy)(B — C) > (Vy)B — (Vy)C (group Ax3). Replacing c by 
x results in a formula still in group Ax3: (Vy)(B’ > C’) > (Vy)B’ > 
(Vy)C". 


(iv) Formula B — (Vy) B, when y is not free in B (group Ax4). Replacing c 
by x results in a formula B’ — (Vy) B’ still in group Ax4, noting that y, 
being different from x, is still not free in B’. 


(v) Formula y = y (group Ax5). This contains no c, so it remains invariant 
upon replacing c by x. 


(vi) Formula ¢ = s > (Aly := t] = Aly := s]) (group Ax6). Replacing c by 
x results in a formula still in group Ax6: t! = s’ > (A’[y := t’] = Aly := 
s’]). Note that A’fy := ¢'] and Aly := s’] are still defined, for x cannot be 
captured (if it entered ¢’ or s’), being new for A. 


Assume the claim for proofs of length n where A appears. We now go to the case of 
length n + 1. If A appeared before the end, then we are done by the I.H. If it appears 
for the first time at the end and it is in TU A1, then the case already has been argued. 
Let then A be the result of MP; that is, the formulae B and B — A (for some B) 
appear before it in the proof. By 1.H., [+ B’ andl + B’ > A’; thus A’ by 
MP. O 


6.5.18 Corollary. Assume that F + A and that there is a proof certifying this, which 
uses no formula from T that contains the constant c. Moreover, let x be a variable 
that does not occur in A. Then} A’, where A’ is obtained by A by replacing all 
the occurrences of c in it by x. 


Proof. Let A be the set of all the formulae from [ that appear in said proof. By 
6.5.17, AF A’. We get + A’ by 2.1.1. ra) 
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6.5.19 Metatheorem. (Auxiliary Constant Metatheorem) Let c be a constant that 
does not appear in the formulae A or B. Assume that. + (4x)A. Moreover, let 
I + A[x :=c] + B, with a proof where the formulae invoked from Y do not contain 
the constant c. Then | Bas well. 


Proof. Let A be the set of all formulae of I’ invoked in the certification of ! + A[x := 
c| | Bas in the statement of 6.5.19. Thus, A + A[x := c] + B. By the deduction 
theorem, A} A[x := c] — B. Let z be new for AU {A, B}. Thus, from 6.5.17, 
At A[x := 2] — B. Hence, A} (3z)A[x := z] — B by 6.5.5 (this part 
needs that z is not free in A, nor in B). By 6.4.5 and an application of SL we get 
At (3x)A — B; hence P+ (3x)A — B. By MP we have} B. O 


6.6 ADDITIONAL EXERCISES 
1. Show that k (¥x)(A > B) = (Ax)A = (Sx)B. 
Hint. Use 4.1.17 to eliminate “3”. 
2. Show that F (Vx)(A — (B= C)) — ((¥x)(A > B) = (vx)(A > C)). 
3. Show that (Vx)((A V B) > C) - (Wx)(A > C). 
4. Show that (Vx)(A — (B AC)) — (Vx)(A > B). 
5, State and prove the 3-dual of 6.1.8. 


6. Prove the following version of the relativized V-monotonicity, which in the nota- 
tion of [2] (cf. p. 175) is 


F (Vx) 4(B > C) — (Vx)4B = (Vx) aC 
while in standard notation it reads 


F (¥x)(A 3 B 3 C) = (¥x)(A > B) > (¥x)(A > C) 


7. Prove 
F (Ax) aaBC = (3x) 4(B AC) 


Hint. Translate first to standard notation. 
8. This relativizes 6.4.2. Prove 
+ AV (Vx) BC = (Vx)a(A V C), as long as x not free in A 


Hint. Translate first to standard notation. 
9. This relativizes 6.4.3. Prove 


F AA (Ax) gC = (3x)g(A AC), as long as x not free in A 
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10. 


11. 


12. 


13. 


14. 


15. 


16. 


17 


18. 


19, 


20. 


21. 


Hint. Translate first to standard notation. 


Prove 
F (4x) 4B v (4x) aC = (Ax) 4(BVC) 


Hint. Translate first to standard notation. 
Prove that if x is not free in A, thent A = (4x)A. 


Prove the one point rule—3-version: | (4x)(x = tA A) = A[x := ¢] if x is not 
free in ¢. 


Prove  (4x)(A A (Sy)(B A C)) = (Jy)(B A (Sx)(A A C)), on the condition 
that y is not free in A and x is not free in B. 


This is the dual of result 6 on p. 175. Prove (Sx) avaC = (4x) aC V (Sx) BC. 


Hint. Translate to standard notation first. 


Prove + (Sx)(Sy)(A A BAC) = (Ax)(A A (Sy)(BAC)), on the condition that 
y is not free in A. 


Prove dummy renaming for 3: If z does not occur in A, then + (4x)A = 
(4z) A[x := g]. 


Here is a suggested proof of 
F (Vx)(ay)A — (ay)(Vx)A (*) 


We split the — and go via the deduction theorem: 


(1) (Wx)(Sy)A (hypothesis) 

(2) (3y)A ((1) + spec) 

(3) Aly :=2] (auxiliary hypothesis associated with (2); z is fresh) 
(4) (V¥x)Aly :=z] ((3) + gen; Okay: x is not free in hypothesis line (1)) 
(5) (Sy)(Wx)A (4) + 6.5.2) 


Now, you should not believe (+) (cf. 8.2.11). 


However, you are asked not simply to dismiss the proof because of 8.2.11, but 
rather to find precisely in which step it went wrong, and why said step is wrong. 


Prove using the auxiliary variable metatheorem: | (4x)(A — B) > (Vx)A > 
(ax) B. 


Prove using the auxiliary variable metatheorem: + (4x)B — (4x)(A V B). 
Prove the dual of Exercise 3: (In [2] notation) + (Ax)4C — (4x) avaC. 


Let ¢ be a predicate of arity 2. 
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Iclaim @(x, y)  d(y, 2) via 6.1.19. 


e Am] right or wrong in my (one-line) claim + proof, and why exactly? 


e If I am wrong in my proof, it might still be possible to provide a different 
correct proof. 


Well, either provide such a correct proof, or definitively show that $(x, y) 
¢(y, x) is not a theorem schema (if this is what you will end up doing, it will 
require tools from 8.2.3). 


22. Prove + (4x)(A — (Vx)A). 
23. Provek x =yAy=2Z2—>x=z. 
24, Prove (Vx)B — A = (Ax)(B — A), provided that x is not free in A. 
Hint. Be mindful of the priorities of the connectives. Use an equational proof. 


25. Prove that the following is an absolute theorem schema on the condition that x is 
not free in B: (Vx)(A — B) = (Ax)A — B. 


Hint. Be mindful of the priorities of the connectives. Use an equational proof. 


26. Prove (4x)A — ((4x)4(B V C) = B Vv (Ax) ,C), if x is not free in B. 


Hint. Be mindful of connective priorities. Then translate in standard notation 
before embarking on a proof. 


27. This relativizes Exercise 24. Prove + (4x)A > ((Vx)4B > C = (Ax)4(B > 
C)), if x is not free in C. 


Hint. Be mindful of connective priorities. Then translate in standard notation 
before embarking on a proof. 


28. Professor N. A. Ive has submitted the following claim for publication: 
bt A= (Vx)A (1) 
He offered the following proof, and I quote: 
“We know (generalization + specialization) that 
+ A iff (Vx)A (*) 


By the metatheorem that says ‘for any two (absolute) theorems, B and C, we have 
+ B=C'’, it follows from (*) thatt A = (Vx)A.” 


Hmm. We know from the text that (1) is wrong, so the question is: Precisely 
which step is wrong in Professor Ive’s proof, and why? 


29. Let ¢ be a predicate of arity 2. 


e Explain why (Vx)(Vy)¢(x, y) > (Vy)¢(y, y) is not an instance of Ax2. 
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30. 


31. 


32. 


33. 
34. 
35. 


36. 


37. 


38. 


39. 


e Nevertheless, prove that (Vx)(Vy)d¢(x,y) — (Vy)¢(y,y) is an absolute 
theorem! 


Let ¢, w be predicates of one variable. Prove 
(a) -+ (Vy) (W(y) > (Ve) 4(2)) + (Wa) $(2) 
Let ¢, y be any unary (of arity 1) predicates, and c an (object) constant. Prove 
F (Vx)(o(x) > Y(2x)), (Wz) (z) F Y(e) 
(a) For any predicate ¢ of arity 2 prove 
F (Va)(Vy) d(x, y) = (Vy)(Vx) oly, 2) 
(b) How is the above different from the known} (Vx)(Vy)A = (Wy)(Wx)A? 
Prove or disprove the schema: “If x is not free in B, then B = (Vx) 4B”. 


Prove or disprove the schema: “If x is not free in B, thent B = (Ax) 4B”. 


Consider the following formula, in a language with a nonlogical function symbol 


f of arity 1: 
(Vx) (x = f(x) > f(z) = f(f(2))) (1) 
(a) What is the Boolean abstraction of this formula? (Indicate by boxing.) 
(b) Is the abstraction a tautology? Why? 
(c) Whether or not the abstraction is a tautology, can you prove (1) in first-order 
logic? 
Hint. While this exercise is self-contained in its present context, you may 
benefit by peeking in the next chapter. 


Assume that [ + A and that there is a proof certifying this where no formula 
from T° used in it contains the constant c. Then for some variable x we have 
T+ (¥x)A’, where A’ is obtained by A by replacing all the occurrences of c in it 
by x. 


Assume that + (4x)A. Moreover, assume that A[x := c] + B, where c is a 
constant that does not occur in A or B. Then} B as well. 


Assume that A[x := c] + B, where c is a constant that does not occur in A or B. 
Then (3x)A - B as well. 


Revisit and formally re-prove all the examples in Section 6.5 that follow 6.5.8, but 
this time do so using a proof “by auxiliary constant” (cf. 6.5.19, and Exercises 37 
and 38 above) to eliminate the leading existential quantifier. 


CHAPTER 7 


PROPERTIES OF EQUALITY 


You will recall our brief discussion of “SFL” (Single-Formula Leibniz) in Section 3.4 
of Part I. That was in the context of Boolean logic. Axiom 6 below (cf. 4.2.1) 


t=s— (Alz:=t] = Alz :=s]) 


is a counterpart of SFL for equality of objects. We explore here some of the con- 
sequences of Ax5-Ax6, including another counterpart of SFL, where the types of 
expressions on either side of — are non-Boolean. 


7.0.1 Lemma, | x= y —- y =xand+x=y-y=2—-x=z. 


Proof. Fort x = y > y = x here is a Hilbert-style proof: 


(1) x=y>(x=x=y=x) (Ax6: “A” isz =x, “t” is x and “s” is y) 
(2) x=y->x=x—7y=x  ((1) + tautological implication (3.3.1)) 
(3) x=x (Ax5) 
(4) x=y>y=x ((2, 3) + tautological implication (3.3.1)) 
I leave the second proof to the reader as an easy exercise. 0 
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7.0.2 Exercise. Prove x =y ~ y =z x =z. O 
7.0.3 Lemma. For any function symbol f of arity n, 


Fx=y7> f (21, +++, Bis X, Zip2,--+, Zn) = f(21,...,2i,Y, Zi42,+++)Zn) 


Proof. This is a Hilbert proof in a relaxed style. 
Let A stand for the formula 


f(Z1,..., Zi, X, Zi¢2,+++, Zn) = f(Z1,..., Zi, Y, Zi42,-++, Zn) 
Then, by Ax6, 
Fkx=y (fey 0-2 Bis B42 +2) = f(Z1,.--,2i,Y,Zi¢2,---)Zn) = 


f(Z1,--+,2iyY, Zi42,+++)Zn) = F(211-++5 Bis ¥sBi425-+-s2n)) 


” 


The subformula “f(z1,...,2i,Y,Zi¢2,---)Zn) = f(Z1,---,Zi,V)Zi¢2,--+)Zn) 
can be dropped. Why? By (Ax5) w = w is an axiom. By 6.1.19 we get 


F f(Z1,- ++) Ziv Vs Zi4a,--+ 9 Zn) = f(Z1,---, Zi, Y, Zi42,---) Zn) 
“Redundant true” does the rest. a) 
7.0.4 Corollary. For any function symbol f of arity n, 

x=yhk f(Z1,..-,2i,X,2j42)---,2n) = f(Z1,---, Bi, VY, Zi¢2, +--+) Zn) 
Proof. By MP and 7.0.3. 0 


7.0.5 Corollary. For any function symbol f of arity n, 


Fx:=y1 yes > Xn = Yn > f(X1,---,%n) = f(¥1,---,¥n) 
Proof. (Sketch) Move all the x; = y; to the left of “t-” and prove instead 
X1 =Yi,---,X¥n =Ynl f(X1,---,Xn) = flyi,---,¥n) 


This is a legitimate approach by the deduction theorem. Then we deduce using 
Corollary 7.0.4 


f(x1,.--;%n) = f(y1,X2,--.,Xn) from x; = y1 


f(y1,X2,-..,Xn) = f(¥1,¥2,X3,--+;Xn) from Xo = y2 


f(¥1,Y2,X3,-.+)Xn) al f(¥1, ¥2,¥3,X4,-- -)Xn) from x3 = ¥3 
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Finally, 


f(¥1,Y2,---s¥n—1,Xn) = f(¥1,¥2,---;¥n—1, Yn) from Xn = Yn 


Transitivity (Lemma 7.0.1) does the rest. 0 


7.0.6 Corollary. For any function symbol f of arity n, and any terms t; and 8;, 
7=1,2,...,n, 


b ty =S8y7°¢°: sty = 8,7 (ees) = f(81,---15n)) 
Proof. By 7.0.5 and the substitution theorem (6.1.19). im 


7.0.7 Corollary. For any function symbol f of arity n, and any terms t; and 8;, 
t= 1,255.00. 


ti = 81,...,tn =Snk f(ti,...,tn) = f(81,..-, $n) 


Proof. By 7.0.6 and n applications of MP. O 


7.0.8 Theorem. For any terms t,t’, 8, we have that} t = t' > s[x:= t] = s[x:= 
t"). 


Proof. We do induction on the complexity of the term s (cf. 4.1.6). 
Basis 1. 8 is aconstant or a variable other than x. Then the claim reads 
kt=t'’ss=8 


and follows by a tautological implication from + s = s. The latter is correct by Ax5 
and an application of 6.1.19. 


Basis 2. _s is the variable x. The claim now reads 
Ft=t 3t=t' 


and is correct (Ax1). 


Induction step. 8 is f(t1,...,tn), where f is a function symbol of arity n and 
t;,2 = 1,...,n, are terms. 


We proceed via the deduction theorem, so we add the hypothesis t = ¢’ and 
embark upon proving 


fa sotalxe eda Jagat kee Tt] 
that is (cf. 4.1.28) 


f(tifx c= t],...,tnlx := ¢)) = f(tifx :=¢'],...,tnfx = t/]) 


(0) t=?’ (hypothesis) 
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(1) 
(2) 


(n) 
(n+1) 


ti [x i= t = ti [x = t'] 
ta[x := t] = te[x = t'] 


a = t] =talx = t'] 
f(ti[x := t],...,talx = dt) = 
f(tilx := t/),...,tnfx := t’]) 


((0) + LH. + MP) 
((0) + LH. + MP) 


((0) + LH. + MP) 


((1)-(n) + 7.0.7) 


O 


Theorem 7.0.8 is an SFL counterpart where both sides of — are of non-Boolean type. 


CHAPTER 8 


FIRST-ORDER SEMANTICS—VERY 
NAIVELY 


This chapter is on naive semantics. That is, we see here what these abstract, “mean- 
ingless”, strings—the first-order formulae—actually say. Specifically, we will see 
what it means, and how, to compute truth values of such formulae. 

1 would like to emphasize the qualifier naive. In the more advanced literature 
one defines semantics rigorously: either defining the truth or falsehood of formulae 
within informal mathematics—that is, in the metatheory—a process originated by 
Tarski and nicknamed Tarski semantics, or defining the truth or falsehood of formulae 
of the original language within some other formal theory, T, possibly over a different 
language, in which case one speaks of formal semantics. In formal semantics, 
formulae of the original language are first translated into formulae over the language 
of J. Then one defines that a formula in the original language is “true” iff its 
translation is provable in T (cf. [45, 53, 54]). 

Here we do neither, but instead imitate Tarski informal semantics, albeit in a 
very simple and purposely sloppy (8.1.2) manner. But this will do for our purpose, 
which is to learn to easily build counterexamples that expose fallacious statements in 
predicate logic. 

Central to all this is the concept of an interpretation. Until now, we treated 
formulae as “meaningless strings of symbols” that we knew how to manipulate 
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syntactically—even to prove formally—with our logical axioms and rules of infer- 
ence. 
But how can we give mathematical meaning to these symbols and formulae? 


8.1 INTERPRETATIONS 


An interpretation of a formula is inherited from the interpretation of the symbols of 
the language where it belongs!!” via a process that we will describe below (8.1.2). 

It is a tool that when applied to any formula in the language will produce an 
interpreted mathematical formula of the metatheory—the “meaning” or “semantics” 
of the original. As was the case in propositional logic, where the semantics of a 
formula (in that case, its truth value) is not unique but, in general, depends on the 
chosen state, an analogous situation holds in the first-order case: The “meaning” of 
a formula is not unique. 

An interpretation of a first-order language is a pair of two components, a non empty 
set D—called the domain or underlying set of the interpretation—and a translator 
M, which is a mapping that assigns an appropriate mathematical object to each of 
the following elements of the language: nonlogical symbols, | and T, each object 
variable and each propositional variable. 


Let me stress that the choices of both D and M are entirely up to us. 


We usually denote the pair (D, M) with the same capital letter that names the 
domain, but in German calligraphic typesetting; that is, D. Clearly, the name D does 
not uniquely determine an interpretation (as neither does D) since an interpretation 
also depends on M. However, the context will fend off possible ambiguities. 


8.1.1 Definition. (Interpreting a Language—Step 1: Translating the Alphabet) 
An interpretation D = (D, M) that we choose gives meaning to |, T, to all object 
and Boolean variables, and to all nonlogical symbols of the alphabet as follows, 
where the result of the translation “M(...)” is written as “...”: 


(1) For each free variable x, x —i.e., the translation M (x)—is some member of 
Dz 


(2) For each Boolean variable p, p® is some member of {t, f}. 


(3) T® =tand L® =f. 


D 


(4) For each object constant c of the alphabet, its translation c~ is some member of 


Dd, 


(5) For each function f of the alphabet, the translation f ® is a mathematical function 
of the metatheory with the same arity as the formal f. f® takes its inputs from 
D and has its output values in D. 


‘7 We recall, of course, that a first-order language consists of three significant sets: the alphabet, Term, 
and WFF. 
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(6) For each predicate ¢ of the alphabet, the translation ¢® is a relation of the same 
arity as ¢ that takes its inputs from D and has its output values in {t, f}. 


Note that the Boolean connectives, the symbol “=”, and the brackets are not 
translated into something else; they retain their standard fixed meaning and 
notation. O 


The next definition describes the inheritance mechanism through which any for- 
mula of the language inherits an interpretation that was given to its alphabet. 


8.1.2 Definition. (Interpreting a Language—Step 2: Translating a Formula) 

Given a formula A over some first-order language, and an interpretation = (D, M) 
of the language. The interpretation or translation of A via D is a mathematical for- 
mula of the metatheory that we denote by A®, which is constructed as follows: 


(i) We replace each occurrence of L, T in A by 1®, T?—ie., f, t-—respectively. 
p 


(ii) We replace each occurrence of a Boolean variable p in A with the specific truth 
value p® given by the interpretation of the language. 


(iii) We replace each occurrence of a free variable x in A with the specific value x 
from D. 


(iv) We replace each occurrence of (Vx) in A by (Vx € D), which means “for all 
values of x in D”. 


Nonlogical elements of A: 


(v) We replace each occurrence of an object constant c in A with the specific value 
c® from D. 


(vi) We replace each occurrence of a function f in A with the specific function f®, 
which has inputs from D and output values in D. 


(vii) We replace each occurrence of predicate ¢ in A with the specific relation 4°, 
which has inputs from D and output in {t, f}. 


(viii) We emphasize once more what was left unsaid in the transformations (i)—(vii) 
above: Every Boolean connective, =, and brackets are translated as themselves. 
Oo 


(1) A translation A? does not have any free variables or any Boolean variables; 
thus we cannot make any substitutions into variables that may alter A®. All its 
object variables are bound. Indeed—by construction—each free variable x and each 
Boolean variable p of the original (“meaningless”) A has been everywhere replaced 
by the value x? from D and p® from {t, f} respectively. 

This is good, because it ensures that A® has a determinate truth value, t or f. We 
will be writing A? = t or A? = f respectively. 
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(2) Definition 8.1.2 imitates Tarski semantics in that the translation results are meta- 
mathematical objects. It also differs in a significant way: The rigorous definition of 
Tarski semantics does not effect a text-editor-like “find/replace” textual substitution 
in A toward constructing its metamathematical analogue, the formula A?. Rather, 
by induction on the formula A, it computes the truth value of A®, which is induced 
by the language interpretation, directly (cf. [45, 13, 35, 29, 53]). 

Thus our approach also borrows from, but is far from being the same as, the process 
that defines formal semantics. The latter does have an intermediate interpretation step 
that produces a formal formula, which one next will check for provability in some 
theory. Our intermediate interpretation as defined here instead produces an informal 
formula of metamathematics, which one next checks for its metamathematical truth. 

I believe the reader understands at the intuitive level how to compute the truth 
value of simple mathematical formulae that have no free variables. Thus, in the 
user-friendly and natural 8.1.2 I am content only to show how we compute the 
metamathematical counterpart of a “meaningless” formula. 


It is useful for the discussion in the next section to also have a concept of a partially 
translated formula that may have some free variables over D. 


8.1.3 Definition. (Partial Translation of a Formula) Givena formula A overa first- 
order language and an interpretation, D, of the language, we may decide, in the ap- 
plication of the translation 8.1.2, to select some particular finite set of formal object 
variables x, y, 2g, ... of the language, which we want to exempt—that is, all their 
occurrences in A—from step (iii) of 8.1.2, thus leaving them untranslated as we go 
from A to its metamathematical counterpart. 

Of course, x, y, 2g, ... may or may not actually all occur free in A. If, say, x does 
not occur free in A, then x does not occur free in the partially translated formula, 
either. Otherwise, it is a free variable in the latter. 

In any case, the resulting mathematical formula of the metatheory where the 


X1,---,Xn have not been translated—called the partial translation of A by D with 
respect to the variables X1,...,Xn—will be denoted by Ay aces to indicate the 
possible free occurrence of X1,...,Xn in it. We view the X1,...,Xn in the formula 
Ay, ae of the metatheory as variables that vary over D. 

Omission of the qualifier partial will mean the full interpretation/translation—a 
formula with no free variables—as per 8.1.2. Oo 


8.1.4 Remark. It follows that ((Vx) A) Pis(VxE D)A® since the variable x of A is 
not translated as part of the translation process of (Vx) A—itis not free in (Vx)A. O 


8.1.5 Example. Consider the formula ¢(x, x), where ¢ is a 2-ary predicate in some 
first-order language. Here are a few possible interpretations: 


(a) D = N (the natural numbers, {0,1,2,...}), 6? =<. 
In this context I mean ““<” as the “less than” relation on natural numbers. 
Thus, (o(a, 2))” is this formula over N: x” < x. For example, if we 
happened to take x” = 42, then (4(,2))” is specifically “42 < 42”. 


(b) 


(c) 
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By the way, we see that ((z, x))” = f no matter how we chose 2”. 


Here are two partial interpretations, the first having exempted the variables y, z, 
the second having exempted x: 


D 
di) (42,2) is t> < 2®. Exempting y and z from the translation (8.1.3) 
Yrz 
makes no difference to the result as neither y nor z occur free in ¢(2, x). 


(ii) (de,2))" isa <a. 


x 
D=N, ¢® =< (the “less than or equal” relation on N). 


Thus, (¢(z, x))” is this formulaover N: 2” < «®. Clearly, no matter what 
the choice of x”, we have ($(x,x))” = t. 


D= {0, {0}} , ¢ =€. 
By “e” here I mean the concrete “is a member of” relation of set theory. 
Thus, ($(a,2))” is this formulaover D: x” € x®. 


Note that every choice of x” makes this false (f). Indeed, 0 € 0 is false because 
(the right copy of) 0 has no members (it is not a set), and {0} € {0} is false 
because (the right copy of) {0} contains the element “0”, not the element “{0}”— 
these two are different, one is of type “number” the other is of type “set”. 0 


The symbols “<, <, €” are nonlogical and have no fixed intrinsic meaning. That is 
why we had to say, when we chose them to interpret “¢” above, what we mean by 
them. For example, we said that we mean “€” to be the “is a member of” relation of 
set theory. 


8.1.6 Example. Consider the formula f(z) = f(y) —~ x = y, where f is a 1-ary 
(unary) function in some language. Here are a few possible interpretations: 


(1) 


(2) 


D=N, f?(x) = x +1 for all values of z in D (ie., in N). 
Thus, (f(x) = f(y) - «= y)> is this formula over N: 


zr +1l=y>+1l—2? =z? 


Note how every choice of z® and y® makes this formula true. 


D = Z, where Z is the set of all integers, {...,—2,—1,0,1,2,...}. We take 
here f(x) = x? for all z in Z. 


Thus, (f(x) = f(y) > 2= y)> is this formula over Z: 


(a)? = (y®)? =3 a? = y> 


? 
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The above is true for some, and false for some other, choices of x? and y?. For 
example, it is false for > = —2 and y® = 2. 


And here are two partial interpretations: 


(i) (f(x) = f(y) +2 =y), isa? = (y®)? +2 =y?. 
(ii) (f(x) =f) > e=y)), ise? =P oasy. gO 


The moral from the above is: There are some interpretations of “f(x) = f(y) > 
xz = y” that are (i.e., have truth value) false (f). 


© 8.1.7 Example. In this example we will consider, in order, two different interpreta- 
tions with different domains D, each a finite subset of N. 
Consider the formula 
c=y- (Va)cr=y (1) 


Here are a few possible interpretations: 


2) D= Ble? 33,9". =3. 


Since there is only one element in D, my only option is to set “x = 3” and 
“yD — 3”, as I did above. 
Thus, formula (1) is interpreted as the following formula over D: 

3=3 (VzeE D)x =3 (2) 


By the way, formula (2) has value t because “3 = 3” is t and so is “(Vz € D)x = 
3”, since it says “all values x in D equal 3”. 


(2) This time I take D = {3,5}, and again x = 3 and y® = 3. 
Thus, formula (1) is interpreted as the following formula over this D: 
3=3- (VzeE D)x =3 (3) 


Now, formula (3) is f, because “3 = 3” is tas before, but this time “(Vz € D)a = 
3” is f, since it still says “all values z in D equal 3”, which fails—D now has 
two elements: It is not true that both are equal to 3. 


The moral is: There is some interpretation where formula (1) above is interpreted 
as false (f). Oo 


8.1.8 Example. Let us look at a few interpretations of 


(Va)(xcEy=rEz)>y=z (1) 
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(1) If we take D to be the “collection”!'8 of all sets, and €® to mean “belongs to” 
(which we still denote as “€”’), then we get 


(V2 € D)(x Ey? =2E2z?) oy? =2? 


which is set theory’s requirement (so-called axiom of extensionality) that two 
sets, y® and z®, are equal if they have the same elements. This interpretation of 
formula (1) is therefore true for any choice of sets y> and z®. 


(2) Take now D = N and €? =<, where once more “<” is the relation “less than” 
on N. Then, formula (1) is interpreted into: 


(V2 EN)(x <y®=r< 2?) sy? =2” 


which is obviously true no matter how we chose the numbers y® and ae 


(3) Take D = N and €?= |, where by “|” we denote the relation “divides” (with 
remainder 0). For example, 2|3 and 2| 1 are false, but 2| 4 and 2| 0 are true. 


Then, formula (1) is interpreted into: 
(Vz € N)(x|y® = z|z®) — y? =z? 


which is also obviously true for all choices of the numbers y®, z®. It says: “Two 
natural numbers, y® and z®, are equal if they have precisely the same divisors”. 


However, 
(4) Take D = Zand €? = |. Now, (1) is interpreted into: 
(Vz € Z)(x|y? =2|2”) + y? = 2? 
We note that unlike the previous interpretations of (1), the current interpretation 


may be false. For example, this is so if y> = 2 and z? = —2. The interpretation 
is then the formula 


(Vx € Z)(x|2=2| -2) 92=-2 
which is false, for the hypothesis (Vx € Z)(x|2 =| — 2) is true (2 and —2 do 


have the same divisors), but the conclusion 2 = —2 is false. O 


8.2 SOUNDNESS IN PREDICATE LOGIC 
8.2.1 Definition. (Universally—or Logically, or Absolutely—Valid Formulae) If 


A? = t, forsome A and D, we will say that A is true in the interpretation D or that 
®D is a model of A. We write this briefly as: 


Fp A (1) 


18Small print: We are not interested here in esoteric issues such as: “But this D is too ‘large’ and is thus not a set.” 
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A first-order formula, A, is universally valid—or just valid—iff it has as a model 
every interpretation of the language where the formula belongs, i.e., (1) holds for all 
interpretations D of the language of A. 
We indicate that A is universally valid by dropping the D subscript from (1), 
writing 
FA (V) 
Oo 


Note the absence of the subscript “taut” from notation (V) above. This is for good 
reason: | x = 2x is correct, but it is not the case that Fra. 2 = x. For the latter says 
of the abstraction, say p, of x = x that Fray, p—a Clearly incorrect metamathematical 
statement, because I may take a state v with v(p) = f. The former says that for any 
® and any choice of the value z® in D 1 have > = x®—clearly true! 


8.2.2 Remark. All axioms in 4.2.1 are universally valid. We have already argued this 
claim early on at a very intuitive level in 4.2.2 and we are going to elaborate further 
here in the light of Definitions 8.1.2 and 8.1.3 without however getting into a 100% 
rigorous proof that requires a much more careful Definition 8.1.2 (we return to this 
topic and give such a definition, and a rigorous proof, in the Appendix, Section A.1). 


Valid Axioms 1: Ax1. It is easy to see that all axioms in group Ax] are valid. 
Indeed, more generally, we argue in outline that 


if Fraue A, then E A (1) 


That the converse of (1) is false was already discussed taking A to 
be x = z as a counterexample. 


Now, why is (1) a true metamathematical statement? 


Well, 
let us assume Eiaut A (2) 


Of course, in the context of predicate calculus, (2) refers to the 
abstraction of A (cf. 4.1.27). 


In that abstraction—we are told by (2)—any arbitrary assignment 
of values to the Boolean variables and prime subformulae of A" 
will lead to a computed truth value of (the abstraction of) A equal 
to t. 


Let us now see what happens when we fix a D and try to compute 
the truth value of A®. Well, the abstraction of A® has exactly 
the same Boolean structure as that of A, because all the Boolean 
connectives and brackets are translated as themselves. Therefore 
the abstraction of A® is also a tautology! 


''9 Prime subformulae were defined in 4.1.25. 
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We next note: 


(a) A prime subformula of A has one of the forms ¢(t1,...,tn), 

t = s,or ((Vx)B). Upon translation of A into A®, these prime 
subformulae are transformed into t? = s®, ¢°(t?,...,t?) 
and ((Vx € D)B®) respectively, i.e., into prime subformulae 
of A®, 
On the other hand, a Boolean variable p of A becomes a 
subformula p® of A® (that is, the metamathematical Boolean 
constant t or f). We may think of it as a “virtual” p to which 
we decided (by virtue of our choice of D) to assign the value 
p®. 

(b) Now, in checking whether A is a tautology, one assigns to the 
prime subformulae and to the Boolean variables of A arbitrary 
Boolean values and computes the result for A according to truth 
tables. By (2), the result invariably is t. 


(c) How does the computation of the truth value of A? relate to the 

description in (b)? Well, rather than assigning truth values to 
the prime subformulae of A? —which are direct translations 
of those of A—one computes and uses their intrinsic truth 
values instead. This latter subcomputation is informed by our 
knowledge of the metatheory where these subformulae mean 
something mathematically tangible (cf. the examples of the 
preceding section). 
On the other hand, A® has no Boolean variables, but at the 
precise spot in the formula structure where A had a p, A® has 
at or f (p®). As we said in (a) above, we may think that this 
is a “virtual” p of A? with the assigned value p®. 

(d) Having noted that A® is a tautology as a Boolean formula of 
the metatheory that is built from metamathematical Boolean 
constants, prime subformulae, brackets, and connectives, it 
follows that its truth value under the computation described 
in (c) is independent—and equal to t—of what values one 
might choose to assign to its prime subformulae. Thus it 
is also independent of the intrinsic, computed value of said 
subformulae. 


This concludes the case for (1) above. 


Valid Axioms 2: Ax2. (Vx)A — A[x := ¢] is universally valid. Indeed, given 
D 
®, and fixing A,x,t, we have that ((vx)A > Alx := il) is, 
according to 8.1.2, 


(vx € D)A? (Alx = i)” (1) 
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Valid Axioms 3: 


Valid Axioms 4: 


Valid Axioms 5: 


We now have: 


D 
The trickiest part to agree on is that the part (Ax = tl) of (1) 
is AP [x := t®] (cf. also 8.1.4). Indeed, we start with 


where, for the sake of visualization and without loss of generality, I 
show two free occurrences of x (actually I may have zero or more). 


Then we get 
Alx:=#]: ...[e]...[é]... (2) 
and, applying 8.1.2, we get 


(Ape := il)” > (6)? 7f? 0.92 


But (3) is what becomes of 


(...) Be...) °Be](-..)® 


i.e., AD, after we apply the substitution “[x := t?]"! 


With this out of the way, we readily see that (1) is true (t): So, 
assume the left-hand side of — in (1), that is, that Ay is true for 
alli € D. But then A? is true when i = ¢® in particular. 


Ax3 and Ax4. I do not have anything to add to the discussion in 
4.2.2. 


D D 


Ax5. x = x is interpreted as “x” = x*” in any D, as we have 
just discussed. And this is true, no matter what the x and D. 


Clearly, the first “=” is formal, while the second is metamathe- 
matical. 


Ax6. t = s — (A[x := t] = A[x := s]) is universally valid. 
Indeed, let us fix t,s, A, x and look at the D-interpretation of this 
formula for some arbitrary D. As in the argument for Ax2, we 
want to find that the computed truth value of (4) is t. 


t? = 8? — (AP[x := t?] = A? [x := s®]) (4) 


So let t? = s® in D. Let us set i = ¢?. But then each of 
A? [x := t®] and A?[x := s®] are the same formula of the 
metatheory: A?. Trivially, AP = A? is true. o 


8.2.3 Metatheorem. (Soundness in First-Order Logic) If + A, then — A. 


Proof. We argue 


this for the equivalent logic 2 of Chapter 5. This simplifies the 


argument due to the presence of a single primitive rule of inference, MP, in that logic. 


¢ 
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As in the proof of 3.1.3, we do induction on the length of (absolute) proofs that 
contain A, and prove that | A; that is, for every D we have A® = t. 

Basis. A appears in a proof of length 1. But then A is the only formula in the 
proof, and hence is in A,. We are done by 8.2.2. 


As an I.H. we assume the claim for proofs of lengths < n, which contain A. 


For the induction step, let A appear in an absolute proof of length n + 1. We have 
two cases: 

(1) A was written for the first time before the ast step. Then, in view of 4.2.9 and 
the I.H., we are done. 

(2) A was written for the first time in the last step. 

If itis in A,, then the case has already been argued in the Basis step. So let instead 
A be derived by MP, that is, for some B, the formulae B and B — A have already 
appeared in the proof. ; 

Now pick any 9. By the I.H. we have 


B® =t (*) 

and (B — A)® = t,ice., 
B® + A? =t (**) 
By the truth table for —, (*) and (+*) yield A? = t. Oo 


We have already remarked that in predicate logic “if + A, then Fray A” is false. 


8.2.4 Metatheorem. (Gédel’s Completeness Theorem) /f = A, then| A. 


A proof is presented in the Appendix of Part II. 


Soundness, just as in the case of propositional calculus, serves the purpose of ob- 
taining counterexamples—in first-order logic these are called countermodels. Thus, 
if for some formula A we do not believe that + A, we need only to show that / A, 
that is, to find an interpretation D—a countermodel—where |¢y A,ie., A® =f. 


8.2.5 Example. Question: Can our logic derive the “rule” 
If A, thenl’ + (Vx)A 


without a condition on x? 
Well, if it could derive the above, it could also derive strong generalization below, 
by-setting P = { A}.'20 
Ak (V¥x)A (1) 


Why not? Because (1) would yield via the deduction theorem 


Ft A (Wx)A (2) 


Then A A; hence AF (Vx) A. 
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By soundness, (2) yields 
FE A — (Vx)A (3) 


Now, (3) is a statement schema. It is supposed to hold—if we think that (1) is all 
right—for no matter what particular formula we may use for A and no matter what 
variable for x. Well, then, (3) should hold when we choose A to be “x = 4” and x 


ce 


to be “x”. However, as we saw in Example 8.1.7(2) (cf. Definition 8.2.1) 
-r=y (Va)r=y 
so (2) is no good, and (1) does not hold in our logic. O 
© 8.2.6 Example. Knowing that strong generalization is illegal in our logic!2' we can 


show that certain other suggested “rules” are impossible (underivable) by “reducing 
strong generalization to them”, that is, by saying: 


If 1 could have this rule, then I could also do (derive) strong generalization, 
but this is impossible. 


For example, we cannot derive 
A= BF (Vx)(Clp := A] > D) = (¥x)(Clp := B] > D) (1) 
because it yields the unprovable (in our logic) 
Ak (Wx)A (2) 


Hence (1) is unprovable too, lest we want a contradiction. Here is the calculation 
that shows that (1) derives (2): 
Hypothesis is A: 


(Vx)A 
= (6.1.11 and Frau X = 7X — 1) 
(Vx)((=p)[p := A] > 1) 
SS 
this is —A 
= ((1) and AF A=T) 
(vx)( (=p)[p == T] > 1) 
— 
this is 3 T 
< (drop V, by  X = (Vy) X when X has no free y) 


The last formula is a tautology, and hence a theorem. Thus, the first line is a theorem 
from A as the assumption used (middle <=) was A. 


!21We have to say “in our logic”. In the logic of (35, 45, 53], A + (Vx) A is perfectly legal. 
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In the same manner one shows that “8.12b” ((1) is “8.12a” of [17]) is nor strong, 
i.e., the following is not valid (cf. Section 6.3). 


D— (A= B)F (¥x)(D > C[p:= Al) = (Vx)(D > C[p := B)) 0 
8.2.7 Exercise. Show that the “rule 8.12b” above is not valid in our logic. O 


8.2.8 Exercise. Given a formula A over a first-order language and an interpretation 
D of the language, we saw that ((Vx)A)® is (Vx € D)A2. 


Show that ((x)A)” is (3x € D)A2. 
Hint. Recall that (Vx € D)A is short for (Vx)((x € D) — A) and (Ax € D)A 
is short for (4x)((x € D) A A). oO 


8.2.9 Example. Why do we insist on choosing a nonempty domain D in an interpre- 
tation D? 

Take any formula A. Clearly (Vx)A -+ (4x)A is false when interpreted on an 
empty domain D. 

Why? “(Vx € D)A®” is true, since there are no x values in D to use toward a 
counterexample. On the other hand, “(3x € D)A®” is false, for it says “there exists 
an x value that verifies AD”, but there are no values in D to choose from. 

“Big deal”, you say. Why should we worry about that? 

Because it also happens that 


+ (¥x)A — (Ax)A (1) 
If we allow empty domains D, the above argument shows 

KK (¥x)A — (Ax)A 
contradicting soundness, something we will not allow! O 
8.2.10 Exercise. Prove (1). Oo 


8.2.11 Exercise. Prove that (Vy)(4x)A — (A4x)(Vy)A is not a theorem schema. 
That is, show that there is a choice of A, x, y such that’ (Vy)(4x)A — (Ax)(Vy) A. 
By the techniques of this chapter (specifically, using soundness) you have to do 
this: Find an appropriate A so that / (Vy)(Ax)A — (Ax) (Vy) A. 
Hint. Take A to be y < x (< is a nonlogical symbol, of course). Now interpret: 
Take D = N interpreting < as the “less than” relation on N. Oo 


8.2.12 Exercise. This was promised on p. 184. Prove that (4x)A A (Ax)B — 
(ax)(A A B) is not a theorem schema. That is, show that there is a choice of A,x 
and B such that / (4x)A A (Ax)B — (4x)(AA B). 

Once again, by the techniques of this chapter (soundness) you have to do this: 
Find appropriate A and B so that /- (4x)A A (3x)B — (Ax)(AA B). 
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Hint. Take A and B to be atomic formulae ¢(x) and (x) respectively, where 
@, w are predicates of arity 1. 

Next interpret: Take D = N and let ¢® (x) be “the number x is even” and w” (x) 
be “the number x is odd”. O 


8.3 ADDITIONAL EXERCISES 


1. Axiom 3 implies (4.2.7) that no matter for which choice of A and x, we have 


F (Vx)(A — B) > (Vx)A — (Vx)B 


Prove by an appropriate countermodel argument that the converse 

((Vx)A — (Vx)B) > (Vx)(A - B) (1) 
is not a universally valid schema. 
Conclude, with reason (one sentence), that (1) cannot be a theorem schema, either. 


2. Is the following schema a derived rule of our logic (that is, of logic 1 or 2)? 


A-— Bt A- (Vx)B, provided x is not free in A (2) 


e If you think that it is, then give a proof in our logic. 


e If you do not think so, then give a definitive reason as to why—for ex- 
ample, using a concrete interpretation, or by proving the invalid “strong 
generalization” using (2) as a lemma. 


3. Redo Exercise 2, but for the schema 


A-— Bt (Ax)A > B, provided x is not free in B 


4. Would your answer change in Exercise 3 if to the left of t we had (Vx)(A — B) 
instead? 


5. Is the following schema a derived rule of our logic (that is, of logic 1 or 2)? 


A— BE (Vx)A — (Vx)B (3) 


e If you think that it is, then give a proof in our logic. 


e If you do not think so, then give a definitive reason as to why—for example, 
using a concrete interpretation, or by proving the invalid strong generaliza- 
tion from (3). 


6. Redo Exercise 5, but for the schema 


(Vx)(A > B) (Vx)A > (Vx)B 


10. 


11. 


12. 


13. 
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instead. 


» Redo Exercise 5, but for the schema 


(vx)A — (Vx)BF (Vx)(A > B) 


instead. 


. Which of the following is a derived rule? 


e A> Bt (Ax)A > (Ax)B 
e (Wx)(A > B)} (Ax)A — (Ax)B 


In each case, a positive answer needs proof; a negative answer needs a precise 
argument in connection with a carefully built countermodel. 


. Formulate and explore whether the relativization of the schema in the second 


bullet of Exercise 8 is provable in our logic. Give precise reasons (proof or 
countermodel) whichever way you choose to conjecture. 


Is this (Vx)(A V B) — (Vx)A V (Vx) B an absolute theorem schema? 


e If you think, “Yes, it is”, then give a proof within our logic. 


e If you think, “No, it is not”, then find specific A and B and an appropriate 
interpretation to carefully and completely make your case for “no”. 


Is this (Vx)A V (VWx)B — (Vx)(A V B) an absolute theorem schema? 


e If you think, “Yes, it is”, then give a proof within our logic. 


e If you think, “No, it is not”, then find specific A and B and an appropriate 
interpretation to carefully and completely make your case for “no”. 


Is this (4x)(A A B) > (Ax)A A (Sx)B an absolute theorem schema? 


e If you think, “Yes, itis”, then give a proof within our logic. 


e If you think, “No, it is not”, then find specific A and B and an appropriate 
interpretation to carefully and completely make your case for ‘“‘no”. 


Consider the proof below: 
(1) (Ax)A (hypothesis) 
(2) Alx := z] {auxiliary hypothesis associated with (1); z fresh) 
(3) (Vz)A[x :=z] ((2) plus generalization; (1) has no free z) 
(4) (Wx)A {(3) plus dummy renaming plus Eqn) 


By the deduction theorem we have | (Ax)A —> (Vx) A. 


You have two tasks: 
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14. 


15. 


(a) Definitively show that (4x)A — (Vx)A is not an absolute theorem schema. 


(b) Grade the above “proof”. That is, find exactly where (i.e., at which step) it 
went wrong—and precisely how. 


Is the following—on condition that x is not free in B—an absolute theorem 
schema? (Cf. Exercise 26 on p. 189.) 
(ax) 4(BV C) =Bv (Ax),C 
If yes, then prove it; if no, then provide a carefully constructed countermodel. 
Is the following—on condition that x is not free in C—an absolute theorem 
schema? (Cf. Exercise 27 on p. 189.) 
(Vx)4B — C = (Ax)4(B > C) 


If yes, then prove it; if no, then provide a carefully constructed countermodel. 


Appendix A 


Gédel’s Theorems and Computability 


This appendix develops two cornerstone results of the metatheory of logic, both due 
to Godel: his completeness and (first) incompleteness theorem. 

The first is the counterpart of Post’s theorem (3.2.1) for first-order logic and 
intuitively says that when it comes to the notion of “absolute truth”, that is, truth as 
understood philosophically for the entire edifice of mathematics, then predicate logic 
speaks “the whole truth”. The second came as a shock when first announced ([16]): 
When it comes to restricted or “relative” truth, that is, the truth of statements (that 
have no free variables) made in some “powerful”! theory such as Peano arithmetic 
or axiomatic set theory, the formal axiomatic method cannot speak the whole truth. 


'“Powerful” in that these theories can express fairly complicated statements about their behavior. For 
example, either of them contains a variable-free formula that in essence says “this axiomatic system cannot 
prove me”. 
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A.1_ REVISITING TARSKI SEMANTICS 


This section looks more carefully at Tarski semantics, which were introduced in 8.1. 
The extra care is needed so that we can now give a rigorous definition of absolute 
truth that can be mathematically discussed and manipulated toward proving, as Gédel 
originally did ([15]), that predicate logic is complete, that is, every absolutely true 
formula has a formal proof in our logic—i.e., the calculus captures the whole truth. 

The semantics, just as we did in 8.1, will be shaped within an interpretation 
® = (D,M), where as before D will be a nonempty set and M a translator of 
symbols. 

As we recall from 8.1.2, the goal in assigning semantics to an abstract formula 
A was to translate it into a “concrete” mathematical statement A? with the ultimate 
goal of “computing” the latter’s truth value, something that we can in principle do by 
putting our “knowledge of mathematics” to work. Since the process eliminates all 
free variables by replacing any free variable x by a value x? from D (cf. 8.1.2), this 
truth value is unambiguously obtained. 

The rigorous definition of Tarski semantics, as was already noted in the remark (2) 
that followed Definition 8.1.2, bypasses this translation into a concrete formula, thus 
avoiding the implication—which we cannot make mathematically precise!—that “we 
know enough mathematics” to evaluate the resulting concrete formula as to its truth. 
The definition is thus impersonal and instead defines truth of any formula A over a 
first-order language directly and from first principles, as long as we have chosen a 
translation of the language in the style of 8.1.1. 


The reader should note that in this section the meaning of the symbol A? (or M(A)) 
has changed: It now denotes a member of {t,f} rather than a metamathematical 
formula (see Exercises A.1.7 and A.1.8). 


Central to the definition will be our ability to “replace any free x by x”, but we 


need to do such substitutions without exiting into the metamathematical realm, since 
we are not building a “(meta)mathematical formula” this time around. 


Pause. But how can we effect such substitutions? As soon as a formal object like 
xin A is replaced by a metamathematical object x? from D, the resulting string will 
not be a well-formed formula anymore: x® is not even in the acceptable alphabet! 


A trick that originated with Leon Henkin and Abraham Robinson bypasses the 
abovementioned difficulty: Just augment the chosen first-order alphabet (cf. 4.1.2) 
to include (names of) objects from the domain D that we have in mind! 

Thus, as in 8.1, we start by fixing a first-order language L. By the way, it focuses 
the mind if we think of the language as the triple of “ingredients ’(V, Term, WFF)— 
where V denotes the chosen first-order alphabet—just as we view an interpretation 
as a pair of ingredients, D and M (cf. 8.1). In step two, we choose an interpretation 
® = (D,M) for our language L. In step three, the last preparatory step prior to 
giving the definitive version of Definition 8.1.1 below, we import the names of all 
the members of D to the alphabet V as (names of) new constants. It is intended that 
these new constants will be translated—by M, to be reintroduced in A.1.5 below—as 
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themselves; i.e., if 1 is such a new imported constant from D, its meaning under the 
interpretation will be 7. I use the same name, say “2”, for a given object of D, 
both metamathematically and formally (e.g., the name for the object “three”, if this 


constant is imported, will be “3” both formally and informally). 


A.1.1 Definition. For any nonempty set D, the first-order alphabet obtained by 
importing the members of D as new constants is denoted by V(D). That is, 
V(D) = VUD. We will use the terminology “D-formulae” and “D-terms” for 
formulae and terms of the original language L, where some free variables have been 
replaced by D values. oO 


It is an easy exercise to establish that this is tantamount to saying: 


A.1.2 Definition. Given a nonempty set D, a D-formula and D-term are a formula 
and term over the alphabet V(D), respectively. The set of all D-formulae and D- 
terms will be denoted by WFF(D) and Term(D) respectively. The augmented 
language is L(D) = (V(D), Term(D), WFF(D)). | 


A.1.3 Exercise. Fix a language L and a nonempty set D. Then Term C Term({D) 
and WFF C WFF(D). O 


A.1.4 Exercise. Fix a language L and a nonempty set D. Then all D-terms and all 
the D-formulae according to Definition A.1.2 can be obtained by repeated application 
of substitutions such as ¢[x := i] and A[x := i] where both ¢ and A are over the 
original alphabet V, andi € D. D0 


A.1.5 Definition. (Tarski Semantics—Step 1: Translating the Alphabet) Givena 
first-order language Z = (V, Term, WFF). An interpretation D = (D, M) appro- 
priate for L (or just “for L”) is a pair where D is anonempty set and M isa “translator” 
or “interpretation mapping” that assigns concrete meaning to the symbols in V(D). 
We will often denote the result of the translation “M(...)” as ‘“...”. 


D 


(1) For each free variable x, x —i.e., the translation M (x)—is some member of D. 


(2) For each Boolean variable p, p® is some member of {t, f}. 


3) T? =tand L® =f. 


D 


(4) For each object constant c in V, its translation c* is some member of D. 


(5) For each object constant 7 in D, its translation i® is ¢ itself. 
© The letters 7, 7, &, m,n will name constants imported from D, utilizing primes or 


subscripts if necessary. 


¢ 
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(6) For each function f of the language L, the translation f? is a mathematical 
function of the metatheory with the same arity as the formal f. f® takes its 
inputs from D and has its output values in D. 


(7) For each predicate ¢ of the language L, the translation ¢® is a relation of the 
same arity as ¢ that takes its inputs from D and has its output values in {t, f}. 0 


We next extend M so as to give metamathematical meaning—and by “meaning” | 
meana concrete value from DU({t, f}, not an “intermediate” mathematical formula— 
to arbitrary D-terms and to arbitrary D-formulae. By A.1.3, this extension will also 
give meaning to terms and formulae of L. 

The reader may want to briefly glimpse at Definition 1.3.5, where the “meaning 
function” v was extended to be meaningful not only on atomic but on all Boolean 
formulae. We noted there (immediately prior to the definition) that the extension is 
different from the original and thus some logicians would prefer a different symbol 
for it, say, 3. We decided against this as it clutters notation and the context can readily 
fend off any ambiguity. For the same reason we should be content with using the 
same symbol, M, for both the original mapping of the symbols of the alphabet V(D) 
into concrete ones and the mapping that maps terms and formulae into their values. 


A.1.6 Definition. (Tarski Semantics—Step 2: Extending M to all of L(D)) Let 
® = (D,M) be an interpretation for L. We extend M to all D-terms and D- 
formulae—still calling it 1—as follows: 


D-terms: We define M (t)—i-e., t? —for each D-term t: 
(i) If ¢ is an object variable, or a constant symbol of V(D), then M is as in A.1.5. 


(ii) If t is f(ti,...,tn) where f is any n-ary function symbol and ¢,...,¢n are 
D-terms, then ? = f?(t?,...,¢2) 


D-formulae: By induction on the complexity (cf. 4.1.15) of such formulae A we 
define A?: 


I. If A is the Boolean constant | or T or a Boolean variable p, then A® has been 
defined in A.1.5. 


Il. For any D-terms ¢ and s, (¢ = s)> = t iff ¢? — s®, where only the leftmost 
occurrence of “=”’ here is formal (i.e., the one in Y); the others are metamathe- 
matical. 


III. For any D-terms ¢,,...,¢, and n-ary predicate symbol ¢, 
D 
(#(t1,.--sn)) =t iff $°(t?,...,¢2) ee 


Assuming that we have defined M for all D-formulae B that are less complex 
than A, we next define M1 (A) using the notation in 1.3.4: 


IV. If Ais B, then AP = F_(B®). 
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V. If A is B o C—where 0 is any of A, V, >, =—then A? = F,(B?,C®). 
VI. If Ais (Vx)B, then A® = t iff for alli € D we have M(Blx :=i])=t. O 


Pause. Why not define the extension of M over all of L(D) by induction on D- 
formulae A, therefore assuming the definition as given for the immediate predecessors 
of A? Because B[x := 2] is not ani.p. of (Vx)B. This is why we resorted to doing 
the recursive definition with respect to the complexity of formulae instead. 


A.1.7 Exercise. Given a language L and an interpretation D = (D, M) for it, let t 
be a D-term. By induction on the complexity of t show that t? € D. O 


A.1.8 Exercise. Given a language L and an interpretation D = (D, M) forit. Let A 
be a D-formula. By induction on the complexity of A show that A? € {t,f}. O 


A.1.9 Definition. (Truth and Models) Let D = (D, M) be a interpretation for L, 
and A be a D-formula. We say “A is true (false) in D” iff AP = t (f). If in 
particular A is over L—i.e., it contains no constants imported from D—then we say 
“%) is a model of A”—and we write Fp A—iff A? = t. 

If & C WEF, then we say “®D is a model of X” and write “Ea &”, iff Ep A for 
each A € &. We say that “‘D is satisfiable’’ iff it has a model. 

When = U {A} C WEF, the notation © | A denotes semantic implication, also 
called logical implication. \t means: “Every model of © is a model of A.” We write 
— A for @ — A. Since every interpretation is, vacuously, a model of @, “@ — A” 
amounts to saying that every interpretation (appropriate for the language of A) is 
a model of A. We say then that “A is logically valid’, universally valid, or just 
valid. Oo 


The following lemmata will make the flow of exposition in the proof of the 
soundness metatheorem smoother. 


A.1.10 Lemma. Let D = (D, M) be an interpretation for L and t be a D-term that 
contains no x. Consider a new interpretation D' = (D', M') where D = D' and M' 
agrees with M everywhere, except possibly at the variable x. Then M(t) = M'(t). 
This last “=” is, of course, on D. 


Proof. See the exercise below. oO 


A.1.11 Exercise. Prove A.1.10 by induction on the complexity of terms. O 


For the benefit of the following lemma we revisit Definition 4.1.21 with an induc- 
tive definition of “free (bound) variable”. 


A.1.12 Definition. (Free and Bound Variables) 
The. case of terms: The concept “x is free in ¢” for terms coincides with the 
concept “x occurs in t”. Namely, if t is x, then x is free in ¢. If £ is a constant or a 
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y # x, then x is not free int. If tis f(t,...,¢,), then x is free in ¢ iff it is free in 
at least one of the ¢;. 


The case of formulae: 


Atomic case: 
(1) x is not free in any of 1, T, p. 
(2) xis free in t = s iff it is so in at least one of ¢ or s. 
(3) xis free in d(t,,...,¢,,) iff it is so in at least one of the ¢;. 
Nonatomic case: 
(i) x is free in —A iff it is so in A. 


(ii) x is free in A o B—where © is one of A, V, >, =— iff it is so in at least one of 
A and B. 


(iii) x is free in (Vy) A iff it is free in A and it is not the same as y. 
A variable that occurs in a formula A, yet is not free, is called bound in A. O 


Thus, in case (iii) above, x is not free in (Vy)A in precisely two (not mutually 
exclusive) cases: x is not free in A, or x = y. 


Notwithstanding the fact that the Tarski semantics of a formula are the truth 
values t or f—not a “concrete” metamathematical formula—nevertheless, the above 
definition allows us to “translate” a first-order formula with free variables into a 
concrete formula with the same number of free variables and vice versa. We present 
the process that achieves this as a definition (A.1.14) below, but first we will introduce 


“eo 


the notation “z’,,”. 


A.1.13 Definition. The symbol Z, denotes the ordered sequence £1,22,...,2n- 
We will simply write z when the length 7n is either understood or is unimportant to 
our discussion. We call Z,, an “‘n-tuple” or an “n-vector”. Oo 


A.1.14 Definition. Let L bea first-order language and D = (D, M) aninterpretation 
appropriate for L. A set S of n-tuples from D (synonymously, relation S(Z,,)) is 
first-order definable in D over L—simply put “definable in D” if the language is 


understood—iff for some formula A of the language L that has x1,...,Xn as its free 
variables, we have, for all m; in DG = 1,...,n): 

S(mi,...,Mn)? iff Ea Afxy = mi] ++ [xn = Mn] 
It is usual to write “A[t1,...,tn]” for “Alx: := ti]--+[x, := ty]” and any 


terms t; as long as it is understood that the substituted variables are the x;. A 


2“Is true” is always implied in informal mathematics when a relation “S(m1,...,7™7n)” is stated. 
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function f with inputs from D and outputs in D is definable in D iff the relation 
y = f(x1,...,£n_)—known as the graph of f—is so definable. Some authors use 
the term “(first-order) expressible” (e.g., [48]) rather than “(first-order) definable” in 
an interpretation D. O 


The above definition gives precision to statements such as “we code (or express) 
an informal statement (i.e., relation) S(x,,...,2n) into the formal language” or 
that “the (informal) statement S(x,,..., 27) can be written (or made) in the formal 
language’. What makes the statement in the formal language is a first-order formula 
A that defines it in the sense of A.1.14. 

Conversely, any first-order formula B, of n free variables, over a language DL 
defines (in D) the ser of n-tuples 


{(kt,....kn) : E> BEhy.--s kad} 


If we call the above set (relation) R, then we can state that the formula B informally 
says “R(x,,...,2n)” in the sense that, for all ki in D, R(ki,...,kn) holds iff 
Fp Blki,..., kn]. 


We next prove a few lemmata that lead to the proof of the soundness metatheorem. 


A.1.15 Lemma. Let D = (D, M) be an interpretation for L and A be a D-formula 
that contains no free occurrences of x. Consider a new interpretation D’ = (D', M') 
where D = D! and M' agrees with M everywhere, except possibly at the variable 
x, Then M(A) = M'(A). This last “=” is, of course, on {t, f}. 


Proof. By induction on the complexity of formulae, mindful of Definitions A.1.6 
and A.1.12. We note at the outset that according to the hypothesis, M(---) = 
M'(.--) for every symbol “---” of the alphabet V(D) except, possibly, x. 


Atomic case: 
(1) The D-formula A is one of p, T, 1. Then M(A) = M’(A). 
(2) Ais t = s. Thus 
M(t = s) =t iff M(t) = M(s) 


iff M’(t) = M’(s) by.A.1.10 
iff M’(é=s)=t by A.1.6 


(3) Ais o(t;,...,tn). Thus, noting that M(¢) = M’(¢), 


M'(¢) 
M($(ti,.--,tn)) = t iff M(d)(M(t),..-,M(tn)) =t 
iff M’(d)(M'(t1),-.., M’(tn)) = t by A.1.10 
iff M’(d(t1,...,tn)) =t by A.1.6 


Nonatomic case: 
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(i) Ais —B. The I.H. applies to B and yields M(B) = M'(B) by A.1.12(i). We 
are done by A.1.6(IV). 


(ii) Ais BoC. The LH. applies to B and C and yields M(B) = M’'(B) by 
A.1.12(ii). We are done by A.1.6(V). 


(iii) Ais (Vy) B. The LH. applies to Bly := ¢], for every ¢ € D, since x is not free 
in this D-formula (cf. comment immediately following A.1.12). Thus, we have 
M (Bly := i]) = M’(Bly := i), forall i € D. By (VI) of A.1.6 the above 
yields M ((Vy)B) = M’((vy)B). Oo 


A.1.16 Lemma. Let D = (D, M) be an interpretation for L, t be a D-term, and 
i € D. Consider a new interpretation D' = (D', M') where D = D' and M’ agrees 
with M everywhere, except possibly at the variable x: M’ sets M'(x) = i. Then 
M(t|x := i]) = M(t). 

Proof. By induction on the complexity of the term t. 

(1) tisx. Then M(t[x := i]) = M(t) = 7 (cf. A.1.5), while M’(t) = M’(x) = i. 
(2) tis y (not x), oris coris k € D. Then (cf. A.1.5) 


M(y) thus, the same as M’'(y) 
M(t[x := t]) = M(t) = ¢ M(c) _ thus, the same as M’(c) 
M(k)  ie., k, thus, the same as M’(k) 


(3) tis f(ti,...,t,). Thus (cf. 4.1.28 and A.1.6) 


M(t|x :=@]) = M(f)(M (til = i)),...,M(talx = il)) 


= M(f)(M'(t),.-.,M’(tn)) by LH. 
= M'(f)(M'(t1),--.,M'(tn)) M and M' agree on f 
= M(t) q 


A.1.17 Lemma. Given a term t, distinct variables x and y, where y does not occur 
in t, and a constant a, then, for any term s and formula A, [x := t]|y := al is the 
same as s[y := a][x := t] and A[x := ¢|[y := al is the same as Aly := al|x := tl. 


Proof. By induction on the complexity of ¢ (done first) and A. Exercise A.1.18 asks 
you to fill in the details of the proof. Oo 


A.1.18 Exercise. Prove Lemma A.1.17. Oo 


A.1.19 Lemma. Let D = (D, M) be an interpretation for L, A be a D-formula, 
and i € D. Consider a new interpretation D' = (D', M') where D = D! and M' 
agrees with M everywhere, except possibly at the variable x: M’ sets M'(x) = 1. 
Then M(A[x := i]) = M’(A). 
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Proof. By induction on the complexity of formula A. 


(1) Ais one of p, 1, T, (Vx)B. Then (cf. 4.1.28, A.1.6, and A.1.15) 


M(p) this is M’(p) 

ene _ J MCL) this is M’(1) 

MRE) A= M(T) this is M’(T) 
M ((Vx)B) this is M’((¥x)B) (by A.1.15) 


(2) Ais t = s. Now M(A[x := iJ) = t iff M(t[x := i]) = M(s[x := i). 
By A.1.16 the latter is true iff M’(t) = M’(s), ie., iff M’(t = s) =t. 


(3) A is O(t,...,tn). M(G(ti,...,tr)[x := i) = t iff M(d)(M(tilx := 
i]),...,M(ta[x := i])) = t.. By A.1.16 and M(¢) = M’(d) the latter is 
true iff M’($)(M"(t1),..., M'(tn)) = t, ie., iff M’(4(t1,...,tn)) = t. 


Nonatomic case: 
(i) Ais —B. By LH. M(B[x := t}) = M’(B) and we are done by A.1.6(IV). 


(ii) Ais BoC. By LH. M(B[x := i]) = M’(B) and M(C[x := i]) = M‘(C). 
We are done by A.1.6(V). 


(iii) A is (Vy)B and ((Vy)B)[x := i] is (Vy)B[x := 4); the substitution being 
defined (cf. 4.1.28). Thus M ((Vy)B{[x := i]) = t iff 


M(B[x := ily := k]) =t, for allk € D (x) 
By Lemma A.1.17, (*) is equivalent to 
M(Bly := k][x := i) =t, forallk € D 
and hence, by the I.H., to 
M'(Bly := k]) =t, forall k € D (+x) 
By A.1.6(VI), (+*) is equivalent to M’((Vy) B) = t. Oo 


We will need one final lemma, to ease the handling of axiom groups Ax2 and 
Ax6 in the proof of A.1.21 below. This lemma embodies a mathematically rigorous 
formulation, and proof, of the remark made in 8.2.2 on p. 203: “The trickiest part to 
agree on is that the part (A[x := t})” of (1) is AD [x := t?]...” 

In the present notation, we want to show that M(A[x := M(t)]) = M(A[x := 
t]). More pleasing intuitively is the notation introduced in Definition A.1.14: 
M(A[M()]) = M(Alz]). 
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A.1.20 Lemma. Let D = (D, M) be an interpretation for a language L and s,t, 
and A be D-terms and a D-formula respectively, while M(t) = i. Then M(s[x := 
t]) = M(s[x := @]) and M(A[x := t]) = M(A[x := 7)). 


Proof. We first do induction on the complexity of s. If s is a constant or y (y # x), 
then both s[x := t] and s[x := i] are just s. Thus, M(s[x := t]) = M(s) = 
M(s[x := i]). If s is x, then s[x := ¢] is t while and s[x := i] is i. Thus, 
M (s[x := t]) = M(t) =i = M(i) = M(s[x :=7]). 

For the induction step let s be f(t,...,t,). 

Then M (s[x := t]) = M(f)(M(ty[x := t]),..., M(tn[x := ¢])). By the LH. 
this is M(f)(M (t[x := @]),..., M(tn[x := @])); that is, M(s[x := @]). 

We next do an induction on the complexity of A. For the atomic case, the subcases 
where A is one of 1, T, p are trivial. If on the other hand A is #(t;,...,t,), then 
M(A[x := t]) = M(¢)(M(ti[x := ¢t]),...,M(tn[x := t])). By the case for 
terms, this is M(o)(M(t;[x := @]),..., M(t, [x := 4])); that is, M(¢[x := ¢]). 
Similarly if A ist = s. 

The induction step in the case of Boolean connectives being straightforward, let 
us do the induction step just in the case where A is (Vw) B. If w is the same as x, 
then the result is trivial. As usual, we assume that the noted substitutions are defined, 
otherwise there is nothing to prove. This entails, in particular, that either w does not 
occur in ¢ (the interesting case) or that.x is not free in B—this being the trivial case 
where both substitutions produce the same formula, (Vw) B (cf. 4.1.28 and 4.1.33). 
We display the interesting case. 


M(A[x := ¢]) = t iff M (((vw B) [x =t])=t 
iff M(( (Vw) B[x := t})) =t 
iff M(B[x := t][w := j]) = t forall j € D, by A.1.6 
iff M(B[w := j][x := t]) = t forall j € D, by A.1.17 


iff M( (B[w := j])[x := t]) =t forall j € D 


Mn” 


( 
iff M ((Blw = jl) [x = 7] 
iff M(B[w := j][x := i]) =t forall j € D 
iff M(B{x := i][w := j]) = t forall j € D, by A.1.17 
iff M(((vw) Bix := il)) = t by A.1.6 
= 


ae 


= t forall j € D, by LH. 


iff M (((vw)B ae = 4] 
iff M(A[x := i]) = oO 


We are ready to revisit soundness within the definition of Tarski semantics. “Our 
logic” will here be “logic 2” of Chapter 5. 
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A.1.21 Metatheorem. (Soundness in First-Order Logic) Let & be any theory over 
a language L and let A be a formula of L. Then + A implies © & A. 


Proof. Assume & | A. Then we have one of the three cases: 

(i) A € © (trivial). 

(ii) A is derived using MP. 
(iii) A € Aq. 
Toward case (ii) we show that the rule MP preserves truth. Let then D be an arbitrary 
model of ©, and B satisfy B® = t and (B — A)® = t. Thus A® = t by 
Definition A.1.6, parts (IV) and (V). 

Most of the work is for case (iii), and we might as well prove a bit more, namely, 
that — A. 

The main effort here is to show that each instance A of the schemata in Axl— 
Ax6 of Definition 4.2.1, as they appear in the list—i.e., prior to prefixing universal 
quantifiers to form a partial generalization—satisfies |= A. 

However, we postpone this task until after we show that prefixing universal quan- 
tifiers preserves truth. All this completes the argument. 

Generalization preserves truth; that is, if = A, then — (Vx)A. This translates 
as “if M(A) = t in every interpretation D = (D, M), then also M((Vx)A) = t in 
every interpretation D = (D, M)”. Well, suppose instead that 


EA (*) 


yet, for some interpretation D = (D, M) we have M((Vx)A) = f. By A.1.6, this 
says that for some i € D, we have 


M(A[x — i) =f (x) 


Choose now a new interpretation, D’ = (D’,M’) where D = D’ and M = M' 
agree everywhere on the alphabet of L, except possibly at x where M’(x) = i. By 
Lemma A.1.19, we have 
M(Alx := i]) = M'(A) (+++) 
But M’(A) = t by («), which contradicts (**). 
We return to case (iii). 
Case of Axl: Eta, A. We pick an arbitrary D = (D, M) and show that M(A) = t. 
Let pi,...,Pn be all the propositional variables that occur in A— 
including under propositional variables both those that are from the 


alphabet V of L and those that are actually prime subformulae.’ Define 
a state v by setting v(p;) = M(p,), for? = 1,...,n. By induction on 


3Prime subformulae were defined in 4.1.25. 
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Case of Ax2: 


Case of Ax3: 


Case of Ax4: 


Case of Ax5: 


Case of Ax6: 


the complexity of “P-formulae”—cf. Exercise 4 on p. 149—we show 
that v(A) = M(A) for each P-formula A (i.e., for each A € WEF). 
The basis is settled at once by the way v was defined and since, by 
the definition of state, and A.1.5, v and M agree on T, |. Let then A 
be —B. By LH. it is v(B) = M(B). Thus, by 1.3.5 (first “=”) and 
A.1.6(IV) (last “=") we have v(-B). = F(v(B)) = F.(M(B)) = 
M(-B). ff finally A is B o C (where o is as before), then the I.H. 
yields v(B) = M(B) and v(C) = M(C). Thus, by 1.3.5 (first “=” 
and A.1.6(V) (last “=”) we have v(Bo C) = F.(v(B),v(C)) = 
F.(M(B), M(C)) = M(BoC). 

Now, the assumption yields v(A) = t. By the preceding result, we 
have M(A) = t. 


A is (Vx)B — B[x := t]. We pick an arbitrary D = (D,M) and 
show that M(A) = t. As M(A) = F_.(M((Vx)B), M(B[x := t])) 
we need establish that if M ((Vx)B) = t, then M(B[x := ¢]) = t. 
Well, the hypothesis yields (cf. A.1.6(VI)) that M (B[x := ¢]) = t, for 
all i € D. In particular, M(B[x := M(t)]) = t. Hence M(B[x := 
t]) = t by A.1.20. 


A is (Vx)(B — C) — (Vx)B — (Vx)C. We pick an arbitrary 
D = (D,M) and show that M(A) = t. Thus, by the truth table 
forF_, (1.3.4) I assume M ((Vx)(B — C)) = t and M ((Vx)B) =t 
and prove M ((Vx)C) = t. The assumptions mean 


B[x := i] > C[x := 7] = t, foralli € D (x) 


Bx := i] = t, forallz € D (**) 
hence C[x := 7] = t, for all 7 € D, and we are done (A.1.6(VI)). 


Ais B — (Vx)B, where x is not free in B. As before, we work 
with an arbitrary D = (D, M) and show that M(A) = t. So assume 
M(B) = t and check whether M ((Vx)B) = t. The latter will be so 
precisely if M (B[x := 7]) = t, for alli € D. But, by the assumption 
on x, B[x := i] is just B (cf. 4.1.33). 


We want x? = x® (cf. A.1.6(II)) for any D. This is immediate as 
“ty = 7” holds in the metatheory. 


Aist = s — (A[x := t] = A[x := s]). So, pick an arbitrary 
® = (D,M) and assume M(t = s) = t, that is, M(t) = M(s). 
We want M (A[x := t] = A[x := s]) = t, that is (cf. Fz in 1.3.4), 
M(A[x := t]) = M(A[x := s]). By A.1.20 that last equality is 
equivalent to M(A[x := M(t)]) = M(A[x := M(s)]) and thus 
holds. | 
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A.1.22 Exercise. Prove that if a set of formulae [ has a model (cf. A.1.9), then it is 
consistent. O 


A.2. COMPLETENESS 


For ages mathematicians were content with arguing informally as they were building 
the edifice that is mathematics, until they encountered the various paradoxes that an 
undisciplined informal approach entails, such as Russell’s paradox. This led to a 
movement, major proponents of which were Whitehead and Russell ([56]), Hilbert 
({22]), and Bourbaki ((2]), that advocated the construction, using the techniques 
of mathematics, of a robust “engine” with the help of which mathematicians could 
prove their various theorems within rigid, finitary processes (formal proofs), and 
with absolutely clear rules of reasoning and selection of assumptions. This “engine” 
is, of course, first-order logic. By “robustness” I refer to the inherent inability of 
this logic to derive contradictions. By “inherent” I mean here that the absolute 
(nonapplied) first-order logic—having no nonlogical axioms that can “go wrong”— 
is contradiction-free: Indeed, // 1 (cf. also 2.6.6); otherwise, by the soundness 
metatheorem of the previous section (A.1.21), we obtain the absurd 1? = t in every 
interpretation D. 

But how many “truths” can we prove within first-order logic? Does this logic 
tell the whole truth? Gédel proved that it does—the logic is complete—in this 
sense: If a formula is true in every model of a chosen set of assumptions (i.e., it 
is semantically implied by these assumptions; cf. A.1.9), then it is also provable 
from these assumptions. In this section we prove Gédel’s completeness theorem for 
so-called countable languages L, i.e., languages over countable alphabets (see the 
discussion under the heading “A brief course on countable sets” on p. 224 below for 
the definition of countable). The proof given here is not Gédel’s original ([15]) but 
is the “modernized” argument due to Leon Henkin ([{20]). 

The strategy for establishing the completeness of our logic, that is, the implication 
“if & — A, then © + A”, is to prove the equivalent contrapositive: 

If 


TVA (1) 


then 
SKA (2) 


We will recall that on p. 116 we introduced the concept of an applied first-order logic 
or theory. It was stated that a theory is a toolbox consisting of 


(i) A first-order language that has a hand-picked, specific, set of nonlogical 
symbols that is appropriate for the intended application (e.g., for set theory we 
just include the predicate €; nothing else). 

(ii) Special axioms that give the basic properties of the nonlogical symbols. 
Intuitively, these state selected fundamental relative truths that characterize the 
theory. 
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(iti) The logical axioms that are common to all theories. Intuitively, these 
state absolute truths that are valid in all theories. 
(iv) Rules of inference. 


It is standard practice in the literature to take most of these tools for granted and 
identify a theory with the set of its special axioms &, which subsume item (i) above. 
Thus, a theory is simply any set of formulae, ©. Any such set, and therefore any 
theory, is called consistent precisely as we defined in the remark following 2.6.6: if 
and only if it fails to prove at least one formula; equivalently, if and only if it fails 
to prove |. Thus, the contrapositive formulation of the completeness theorem, (1), 
immediately yields that a consistent theory has a model. 

To prove that (1) implies (2) we fix a countable first-order language L and a theory 
= over L that does not prove A, and proceed to construct a model D = (D, M) for 
& that is not a model of A (cf. also the approach in the proof of 3.2.1). 

To make the construction—which has substantial methodological overlap with the 
one involved in the proof of Post’s theorem in Section 3.2—self-contained, | outline 
below, with straightforward informal proofs where needed, a number of facts from 
set theory that we will need. 


A brief course in countable sets. A set A‘ is countable, if it is empty or (in the 
opposite case) if there is a way to arrange all its members—possibly with repetitions— 
in an infinite linear array, in a “row of locations”, utilizing one location for each 
member of N. Since it is allowed to repeatedly list any element of A, even infinitely 
many times, all finite sets are countable. 


We can convert a two-dimensional enumeration 


(7, for all i, in N 


into a one-dimensional (one row) enumeration quite easily. The “linearization” or 
“unfolding” of the infinite matrix of rows is effected by walking along the arrows as 
follows: 


(0,0) (0,1) (0,2) (0, 3) 
a 7 di 

(1,0) (1,1) (1,2) 
a il 

(2,0) (2,1) 
7 


(3,0) 


Suppose now that A is a countable set. It is clear that every subset of A is countable: 
If 0 4 B C A, then we enumerate the elements of B as follows: 


4This appendix often refers to sets by name. Much of the time such names are capital Latin letters, 
A, B, C, unless I refer to a set of formulae (2, A, TP). The context will protect us from confusing these 
A, B,C for formulae. 
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Fix a member b € B that will be used as explained below. We now form 
two arrays, an auxiliary array and a target array. The latter will contain an 
enumeration of B. Enumerate A by steps. In each step we produce the next 
member of A, say a, which we put in the first unused location of the auxiliary 
array. If a € B, then we also place it in the first unused location of the target 
array. Otherwise we put b in that location at this step. 


Another fact that we will need is that if A 4 @ is countable, then it also has an 
enumeration where every element appears infinitely often. Indeed, fix an enumeration 
Q9,01,@2,... of A. Form the infinite matrix whose every row is ag, a), @2,... and 
linearize the matrix in the manner we linearized (™mi,;)for all i,j in above. That is, 
enumerate A as follows: 

ao, 

ag,@1, 
ag,@1,42, 
ag, Q, a2, a3, 


Examples of nonempty countable sets are: any finite set; N; the set of all even 
integers; the set of all integers;> the set of all nonnegative rational numbers.® 

Let now A be some countable non empty set (possibly infinite), and fix an enu- 
meration of it, ag,a,,... For example, A might be the alphabet V of a countable 
first-order language. The set of all strings of length two over A is the set of all aja, 
for i > 0,7 > 0, and is linearizable in the manner of (™;,,;)for all i,j inN- That is, the 
set of all such strings is countable. This extends to strings of any length, as we can 
see via simple induction. Fix an n > 0 and assume that the set of strings of length n 
is countable, with an enumeration dg, d,,... But then so is the set of strings of length 
n+ 1, since these are all the strings of the form d;a;, for i > 0,7 > 0. 

But how about the set of all strings over A? This is countable too! Indeed, let 
ag, ay, aq,... be an infinite enumeration of all strings of length n > 0. Then we can 
enumerate all strings by linearizing the infinite matrix ai, fori > 0,7 > 0.7 Note 
that the first row, ay, for j > 0, consists of the empty string, €, everywhere. 


Think of an integer as a pair (n,m), where m € N and n = 0 or n = 1. The intention is to have (0, m) 
“code” (stand for) m while (1, m) code —m. We know that the set of all (n,m) forn > 0,m > Ocan 
be linearized just as the matrix-above was. The set of all integers is (coded as) a subset of this matrix. 
®Think of such a rational as a pair (n,m), where n € N and m € N with m £ O so that (n, m) stands 
for n/m. The set of all (n,m) forn > 0,m > Ocan be linearized just as the matrix above was. The set 
of the nonnegative rationals is (coded as) a subset of this matrix. 

7There is some esoteric small print here: In the proof of the countability of the set of ail strings we 
tacitly used a set-theoretical principle known as the axiom of choice. It enables us to make mathematical 
constructions that involve an infinite set of selections from a set in the absence of a “precise rule” that 
lets us specify these selections. In our case, out of many possible enumerations for each string length, we 
chose one, for each n > 0. Even more esoteric is a result of Feferman and Levy that shows the necessity 
of the axiom of choice principle if one wants, in general, to show that the union of a countable set— 
Ao, A1, A2,...—of countable sets is itself countable ([14]). The said union is denoted by Un>o An 
and means the set S with an “entrance condition” x € S that is equivalent to “for some i, z € A;™. That 
is, S contains all the objects found in all the A; and nothing else. 
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We now turn to the construction of the model we announced prior to our preceding 
digression into the properties of countable sets. At first we pick an infinite countable 
set N—for example, we can take N to be N—and fix an enumeration mg, m1, 7™2,... 
of N. As usual, WFF(JV) is the set of all N-formulae over L. We note that 
© C WFEF(N). We next define an extension T of S—which simply means a superset 
of ©, ie., 2 CT C WEFF(N)). T will have a number of key properties that will lead 
to the Main Lemma (A.2.4). 

To this end, we fix attention on an enumeration Go, G, Go, ... of all formulae of 
WFEF(V), and on an enumeration Eo, F,.. . of all “existential” formulae among the 
G;, that is, those that have the form (4x)B. We assume without loss of generality 
that each formula E; occurs infinitely often in the list. 


Pause. Can we do this? For sure: WFF(JV) is a subset of the countable set of 
all strings over V(V), thus it is countable itself. In turn, the subset of WFF(N) that 
contains all the “existential” formulae (4x) B, but just those, is countable, and by the 
results in our “course in countable sets” can be enumerated so that every member is 
repeated infinitely often. 


We can now define a sequence I'9,I',,... by recursion, in two steps, using the 


intermediate A,,-sequence that we define in parallel: Let 9 = &, and for n = 
0,1,... let 


(3) 


re T,U{G,} iff,U{Gra}VA 
" [Pp U{-Gy} otherwise 


This is almost identical to the construction in the proof of 3.2.1 (see also Claim Four 
on p. 96, and (i) below). We next let 


A,U {B[x :=c]} if A, k E, where E, is (4x)B 
Pry = (4) 


An otherwise 


In (4) we choose the so-called Henkin constant c € N so that c = m, where 7 is the 
smallest such that m; does not occur in any of A, Go,...,Gn, Eo,---, En- 


We now note a set of properties of the I’,, sequence of formulae: 
(i) If, A, then A, V7 A. Thisis clear if A,, is given by the topcase. Otherwise, 


we have’, U{G,} + A, which precludes T,U{-G,,} + A (see the analogous 
argument in Claim Four on p. 96). 


(ii) If A, A, thenT’,4; 1 A. Let',41 + A instead. Thus, A, U{B[x := c]} + 
A. As c cannot occur in © (why?), and A, - (4x)B, the conditions of 6.5.19 
apply and we have A,, | A; a contradiction. 


(iii) For every n > 0,T,, / A. By trivial induction on n, via (i) and (ii), given that 
IM9=ZVA. 


We now define I by 
T= U Tr, (5) 
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(iv) PY A. If not, let By,..., By be all the formulae of I used in a proof that 
certifies [+ A. Let I, for appropriate m, contain all the B;. ThenT,, | A, 
contradicting (iii). 


(v 


— 


For every B € WFF(N), either B € T or (=~B) € T, but not both. Indeed, 
each B is some G;. By (3) and (4), at stage i of the construction, one of B; 
or —B; is placed in T';+,. If at different stages (why “different’?) both some 
B and —B enter in T, then, by 2.5.7, T + 1, and hence (cf. 2.6.6) F + A, 
contradicting (iv). 


(vi) [is a maximal consistent theory in that whenever B ¢ I, then TU {B} is 
inconsistent. Indeed, if B ¢ I’, then (—B) € T, and hence TU{B}+ —B. But 
PU{B}F B, too. 


(vii) Tis deductively closed; that is, [+ B implies B € T. Otherwise, by maximal 
consistency (cf. 2.6.6), we get [U{B}+ A;hence I A (2.1.6), contradicting 
(iv). 


(viii) Ay CT. Indeed, by deductive closure and 2.1.1, Bentails "+ B for any B. 


(ix) T is an N-Henkin theory, meaning that, [+ (Ax) B implies, for some k € N, 
[+ B[x := k]. Indeed, let [+ (4x)B, where (4x)B is E,, for some n. By 
(vii), E, € Tm and hence, by (3), E, € Am for some m. Without loss of 
generality we may assume m = n. Indeed, 


Case 1: If m < nis our original situation, then note that A,, C A, 


Case 2: If m > nis our original situation, then, as (3x) B is enumerated as an 
E; infinitely often, take an i > m such that £; is still (4x) B. But now we are 
back to Case 1. 


Thus, by (4), B[x := k] is placed inT,,,1 for some k € N. 


So far, the construction presented and its properties track fairly closely the one 
given in Section 3.2, and its properties. The primary design aim in either construction 
was to construct a maximal consistent theory I that contains S—but fails to prove 
some given formula A—and use the theory’s properties to construct a model for 
&. But what is (ix) good for? In informal mathematics the truth of an existential 
statement such as (4x) B, where B has at most only x free, entails the existence of 
an object (constant) c that makes B(x := c] true—this is the semantics of 3. This 
phenomenon is not replicated in formal logic in general; that is, (Ax) B — B[x := c] 
is not a theorem schema. See Exercise A.2.13. Henkin’s construction in (4), which 
has as a consequence (ix) above, is the additional work we have to do to make the 
huge (cf. (v) and (vi)) theory I behave, with respect to the quantifier 3, according to 
what the quantifier’s semantics dictate. Such behavior in turn supports our task to 
build a model of © whose informal logical properties we can mimic formally within 
the theory I’. 

The final concern is to allow T to handle constants in the domain of the (not-yet- 
constructed) model of & with some of the effectiveness informal mathematics can 
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exhibit. In particular, we will need one more property in order to finally define our 
model: that P can distinguish constants; that is, if m # n in N, metamathematically 
speaking, then I certifies this by proving ~m = n, which by (vii) means that the 
“certificate” is just (~m = n) € T’. But this is not necessarily true for the arbitrarily 
chosen N! However, we can make it happen for a “smaller” set D obtained by 
judiciously discarding members of N. And this involves a few more technical steps! 
We start by defining the “equality class of n” for each n € N: 
‘ 


e(n)={meEN:ThKm=n} (6) 
The e(n) sets have some interesting properties: 
el. n € e(n): Indeed, P+ n = n by AxS and 6.1.19. 


e2. If e(n) and e(m) both contain the same i from N, then e(n) = e(m): First, note 
that the hypothesis translates to F ¢ = n and’ + 4 = m. By 7.0.1 via 6.1.19 

we get 
ThkKn=m (7) 


and 
Trhkm=n (7') 


If now k € e(n), then + k =n; hence P+ & = m by (7) and 7.0.1 via 6.1.19. 
That is, k € e(m). The converse follows using (7’) instead. 


e3. If k € e(n), then e(k) = e(n): by el and e2. 


e4, If e(k) = e(n), then! + & = rn: by assumption and el, k € e(n). We now 
invoke (6). 


We next define the set 
D = {mine(n) :n € N} (8) 


where “min S” for any 6 # S C N denotes the element m; of S thathas the smallest 
index 1 in the fixed (in our discussion) enumeration mo, m1, ... of N (cf. p. 226). 

By el, the selection “min e(n)” is always possible. Let next k 4 n in D. Thus, 
k = mine(z) and n = mine(j) for some 2, j in N. It follows that e(k) = e(i) and 
e(n) = e(j) by e3. Can we have P+ k = n? If yes, then k € e(n) by (6); hence 
e(k) = e(n) by e3. We conclude that e(i) = e(j) and therefore this set has two 
distinct elements, k and n, of smallest index in the (m;):>0 enumeration of N, which 
is absurd!® We have: 


A.2.1 Lemma. I distinguishes the members of D, that is, m # n in D implies 
Th 7m ¢n. 


®The reader who is conversant with the concepts of equivalence relations and equivalence classes will 
recognize the relation between n and m in N given by ' + n = mas an equivalence relation, i.e., one 
that is reflexive, symmetric, and transitive. He will also recognize the set e(n) as the equivalence class 
with representative n. We avoided this terminology in the interest of those readers who have not seen 
these concepts before. 
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A.2.2 Lemma. I is an D-Henkin theory (cf. (ix}). 


Proof. So let T + (4x).B. We know from (ix) that for some k € N, we have 
T+ Bx := k]. Letn = mine(k). Thenn € Dandl' + n=k (by (6) and (8)). By 
Ax6, 0+ B[x := n] = Blx := k]; hence + Blx := nl}. o 


A.2.3 Exercise. D # 9. ia 


In summary, we have the statement below: 


A.2.4 Lemma. (Main Semantic Lemma) /f © is a theory over a first-order lan- 
guage L that cannot prove A (also over L), and N is any infinite countable set, 
then there is a nonempty subset D C N and a D-Henkin theory T C WFF(N) that 
extends —that is, & © T!—which distinguishes the constants of D (in the sense of 
A.2.1) and satisfies (iv){viii) as well. 


A.2.5 Remark. We continue working with L,&,N,D,T as in A.2.4. By Ax5 and 
6.1.19,+ ¢ = ¢ for any N-term t; hence | t = t. By 6.5.2 we have’ + (Ax)x = t. 
Thus, + m = t for some m € D, since T is D-Henkin. This m is unique'in D 
for, if not, then we also have [ | n = t, for some n € D where n # m. By 7.0.1, 
TEm=n. ButA.2.1 yields f  -m = n, contradicting consistency of [ (iv). O 


We are ready to define an interpretation D = (D, M) of L: 


A.2.6 Definition. We start with a consistent theory © over a first-order language L, 
and an arbitrary countable infinite set NV. We will actually define the interpretation 
®D = (D, M) for the augmented language L’, which has as alphabet V’ = VU(N — 
D).° Trivially, this will induce an interpretation for L itself: All we have to do is to 
forget that we have interpreted also the constants from N — D. In what follows we 
faithfully track Definition A.1.5. We thus give meaning to all elements of V’(D): 


(1) For each object variable y from L’ we have | m = y fora unique m € D (cf. 
A.2.5). We set M(y) = m. 


(2) For each Boolean variable q from L’, we set M(q) = t iffq eT. 
(3) We set M(L) = f and M(T) =t. 


(4) For each constant symbol c from L’ we have . | m = c fora unique m € D by 
Remark A.2.5. We set Mf(c) = m. 


© Note that if c € D, then—m being also in D—c = m by an argument based 
on the concluding remarks (uniqueness) in A.2.5. This is as it should be! (Cf. 
A.1.5(5).) 


Recall that N — D, set difference. denotes all the members of N that are not in D. 
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(5) Let f be a function symbol from L’ of arity k > 0. We want to specify a 
“concrete” (metamathematical) M(f). We do so by specifying what inputs (from 
D) generate what outputs (in D) under this M(f). Let then m,,...,m, € D. 
By A.2.5,[  m = f(mj,...,m,) for a unique m € D. Thus we define 
M(f)}(m1,...,m,)—+the output—to be this unique m: M(f)(m1,...,mx) = 
m. 


(6) Foreach k-ary predicate symbol ¢ from L’, we let M () be the metamathematical 
relation that has the following input/output behavior: M(@)(n1,...,2,) = t iff 
O(m1,.-.,nk) ET. oO 


We will prove that D is a model of © with the help of two lemmata. 
A.2.7 Lemma. For every D-term t over L' (same as N-term over L!) T+ t = mif 
m= M(t).'© 


Proof. By induction on the complexity of t: If ¢ is a variable or constant we are done 
by (1) and (4) of A.2.6 (being mindful of 7.0.1). Suppose that t is f(t,,...,tn). By 
the I.H. we have 


TH t; = ki, wherek; = M(t;), fori =1,...,n 
Thus, by 7.0.7, and since k; are also formal names of constants, 
TE f(ty,...,tn) = f(ki,.--, kn) (*) 


Now, M(t) = M(f)(M(t1),.-.,M(tn)) by A.1.6. In other words, M(t) = 
M(f)(ki,--., Kn), say, = 9. By A.2.6(5), °F j = f(kr,..., kn), which by 7.0.1 
and (*) yields TF f(t1,...,tn) =J. O 


A.2.8 Lemma. For every D-formula B over L', M(B) =t iff B ET. 
Proof. By deductive closure of T (cf. (vii), p. 227), we can use B € T and} B 
interchangeably. We use induction on the complexity of B (cf. A.1.6). 

The atomic cases: 


(i) Bis p. We are done by A.2.6(2). 


(ii) Bis T. As M(T) = t (A.2.6(3)), we need to show T € [. This is so by (viii) 
on p. 227. 


(iii) Bis 1. As M(T) = f (A.2.6(3)), we need to show | ¢ I. This is so by (iv) 
on p. 227 (cf. also 2.6.6). 


This “C.F t = m if m = M(t)” may appear a bit roundabout. Why not just say “TF t = M(t)”? 
Well, we allowed letters such as i, j, k, m,n to have a dual role, as formal names of constants imported 
into L' from D and as informal names of members of D. However, we have made no agreements to adopt 
notation such as “M(...)” formally, nor will we. M remains a symbol outside our formal alphabet V’. 


COMPLETENESS 231 


(iv) Bist = s. In one direction, let M(t = s) = t, that is (A.1.6), M(¢) = M(s). 

Let us call 7 this member of D. By A.2.7 we have D+ ¢ =i andr s =i, 
hence (7.0.1) t= s. 
In the other direction we start with what we have just concluded. Then A.2.7 
yields TP} é =i andI+ s = j, where i = M(t) andj = M{s). By 7.0.1, 
['} i= 7. This entails i = 7 metamathematically (in D)—and hence M(t = 
s) = t—-since otherwise | 77 = 7 (A.2.1) contradicting I"’s consistency. 


(v) Bis @(t,,...,tn). Leth; = M(t;) @=1,...,n). 
By A.2.6(6), M(¢)(ki,...,kn) = t iff O(k1,...,kn) € T iff 


Tr o(k1,-..,kn) (+*) 
By repeated application of Ax6 and A.2.7—the latter yielding [ + t; = k; 
(i = 1,...,n)—we see that (+**) is equivalent tol + ¢(é),...,tn). 


The nonatomic cases: 


If B is any of -C or Co D foro € {A, V, >, =}, then the argument is precisely 
the same as the one given in the proof of 3.2.1, under “Main Claim” (p. 97). Thus 
we will consider here only the case where B is (Vx)C. Let then M((Vx)C) = t. 
Thus (A.1.6), M(C[x := i]) = t, forall i € D. By the 1H. we have 


C[x := i] ET, for alli € D (* x x) 


We want to conclude that (Vx)C' € I’. If not, then =(Vx)C is inT ((v), p. 227), which 
via 3.2.1 and the definition of 3, is equivalent to (Ax)=C € T. By the D-Henkin 
property of I’, for some k € D we have =C|[x := k] inl’, which along with (+* * *) 
contradicts the consistency of I (iv), p. 227). 

Conversely, let (Vx)C € I’. Then (6.1.5 and deductive closure of I), C[x := i] € 
I, foralli € D. By the I.H. M(C[x := i]) = t, foralli € D; hence M((Vx)C) = t 
by A.1.6. O 

Thus, by & C T and the just proved A.2.8, M(B) = t for all B € &, while 
M(A) = f by (iv) of p. 227. In summary, 


A.2.9 Lemma. With ¥ and A as given ((1) on p. 223) and D as constructed, D is a 
model of %, but not of A. Thus & |£ A. 


A.2.10 Corollary. (The Consistency Theorem for First-Order Logic) Every con- 
sistent theory has a model. 


Proof. Start with a consistent ©. Thus for some A in its language, © 1 A. O 


A.2.11 Metatheorem. (Gédel’s Completeness Theorem) Given a theory % over a 
first-order language L. If A is a formula over L, then S: | A implies Ub A. 


Proof. We have already shown that if © !/ A, then © A. oO 
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A.2.12 Exercise. (Compactness of First-Order Logic) Let © be a finitely satisfi- 
able theory over some language L, that is, every finite subset of © is satisfiable 
(A.1.9). Prove that the entire © is also satisfiable. oO 


A.2.13 Exercise. Let L be a first-order language with only one constant, c. Show 
that F (Ax)B — B[x := cl. 0 


A3 A BRIEF THEORY OF COMPUTABILITY 


Computability is the part of logic that gives a mathematically precise formulation to 
the concepts algorithm, mechanical procedure, and calculable function (or relation). 
Its advent was strongly motivated, in the 1930s, by Hilbert’s program, in particular by 
his belief that the Entscheidungsproblem, or decision problem, for axiomatic theories, 
that is, the problem “Is this formula a theorem of that theory?” was solvable by a 
mechanical procedure that was yet to be discovered. 

Now, since antiquity, mathematicians have invented “mechanical procedures”, 
e.g., Euclid’s algorithm for the “greatest common divisor”,!' and had no problem 
recognizing such procedures when they encountered them. But how do you math- 
ematically prove the nonexistence of such a mechanical procedure for a particular 
problem? You need a mathematical formulation of what is a “mechanical procedure” 
in order to do that! 

Intensive activity by many (Post [37, 38], Kleene [26], Church [4], Turing [55], 
Markov [34]) led in the 1930s to several alternative formulations, each purporting 
to mathematically characterize the concepts algorithm, mechanical procedure, and 
calculable function. All these formulations were quickly proved to be equivalent; 
that is, the calculable functions admitted by any one of them were the same as those 
that were admitted by any other. This led Alonzo Church to formulate his conjecture, 
famously known as “Church’s Thesis”, that any intuitively calculable function is 
also calculable within any of these mathematical frameworks of calculability or 
computability. ' 

By the way, Church proved ([3, 4]) that Hilbert’s Entscheidungsproblem admits no 
solution by functions that are calculable within any of the known mathematical frame- 
works of computability. Thus, if we accept his “thesis”, the Entscheidungsproblem 
admits no algorithmic solution, period! 

The eventual introduction of computers further fueled the study of and research 
on the various mathematical frameworks of computation, “models of computation” 
as we often say, and “computability” is nowadays a vibrant and very extensive field. 
The model of computation that I will present here, due to Shepherdson and Sturgis 


' That is, the largest positive integer that is a common divisor of two given integers. 

'2] stress that even if this sounds like a “completeness theorem” in the realm of computability, it is not. It 
is just an empirical belief, rather than a provable result. For example, Péter [36] and Kalmér [25], have 
argued that it is conceivable that the intuitive concept of calculability may in the future be extended so 
much as to transcend the power of the various mathematical models of computation that we currently 
know. 
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[44], is a later model that has been informed by developments in computer science, 
in particular by the advent of so-called high-level’? programming languages. 


A.3.1 A Programming Framework for Computable Functions 


So, what is a computable function, mathematically speaking? There are two main 
ways to approach this question. One is to define a programming formalism—that 
is, a programming language—and say “a function is computable precisely if it can 
be ‘programmed’ in the programming language”. Such programming languages are 
the Turing Machines (or TMs) of Turing and the unbounded register machines (or 
URMs) of Shepherdson and Sturgis. Note that the term machine in each case is 
a misnomer, as both the TM and the URM formulations are really programming 
languages, the first being very much like assembly language of “real” computers, the 
latter reminding us more of (subsets of) Algol (or Pascal). 

The other main way is to define a set of computable functions inductively, starting 
with some initial functions, and allowing the iteration of function-building operations 
to build all the remaining functions of the set. This approach (originally due to 
Dedekind [8] for what we nowadays call primitive recursive functions, and later 
due to Kleene [26] for what we nowadays cal! partial recursive functions) is very 
elegant, but is less intuitively immediate, whereas the programming approach has the 
attraction of being natural to those who have done some programming. 

We now embark on defining the high-level programming language URM. The 
alphabet of the language is 


—,+,-—,:,X,0, 1, 2,3, 4, 5, 6, 7, 8, 9, if, else, goto, stop (1) 


Just like any other high level programming language, URM manipulates the contents 
of variables. However, these are restricted to be of natural number type—i.e., the 
only type of data such variables can denote (or “hold”, or “contain”, in programming 
jargon) are members of N. Since this programming language is for theoretical 
considerations only—rather than practical implementation—every variable is allowed 
to hold any natural number whatsoever, hence the “UR” in the language name 
(“unbounded register”, used synonymously with variable of unbounded capacity). 

The syntax of the variables is simple: A variable (name) is a string that starts with 
X and continues with one or more 1: 


URM variable set: X1,X11,X111,X1111,... (2) 


Nevertheless, as we have been doing in the case of first-order languages, we will 
more conveniently utilize the bold face lower case letters x, y,z, u,v, w, with or 
without subscripts or primes as metavariables in our discussions of the URM, and in 
examples of programs. 

Rather than empioying “BNF” notation to define the language (cf. p. 17)—that 
is, the syntax of URM programs—I will simply say that a URM program is a finite 


'3 The level is “higher” the more the programming language is distanced from machine-dependent details, 
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(ordered) sequence of instructions (or commands) of the following five types: 


L:xea 

L:xex4l1 

L:x«—x-+1 (3) 
L: stop 


L: if x = 0 goto M else goto R 


where L, M, R, a, written in decimal notation, are in N, and x is some variable. We 
call instructions of the last type if-statements. 

Each instruction in a URM program must be numbered by its position number, 
L, in the program—‘:” separating the position number from the instruction. We call 
these numbers labels. Thus, the label of the first instruction is always “1”. The 
instruction stop must occur only once in a program, as the last instruction. 

The semantics of each command is given in the context of a URM computation. 
The latter we will let have its intuitive meaning in this subsection, and we will defer 
a mathematical definition until Subsection A.3.3, where such a definition will be 
needed. 

Thus, for now, a computation is the process that cycles along the instructions of 
a program, during which process each instruction that is visited upon—the current 
instruction—causes an action that we usually term “the result of the execution” of 
the instruction. J said “cycles along” because instructions of the last two types (may) 
cause the computation to loop back or cycle, revisiting an instruction that was already 
visited by the computation. 

Every computation begins with the instruction labeled “1” as the current instruc- 
tion. The semantic action of instructions of each type is defined if and only if they 
are current, and is as follows: 


(i) EL: x — a. Action: The value of x becomes the (natural) number a. Instruction 
L + 1 will be the next current instruction. 


(ii) DL: x — x+1. Action: This causes the value of x to increase by 1. The 
instruction labeled ZL + 1 will be the next current instruction. 


(iii) L : x — x + 1. Action: This causes the value of x to decrease by 1, if it was 
originally nonzero. Otherwise it remains 0. The instruction labeled E + 1 will 
be the next current instruction. 


(iv) L : step. Action: No variable (referenced in the program) changes value. The 
next current instruction is still the one labeled L. 


(v) L: if x = 0 goto M else goto R. Action: No variable (referenced in the 
program) changes value. The next current instruction is numbered M if x = 0; 
otherwise it is numbered R. 


This command is syntactically illegal (meaningless) if any of 1 or R exceed 
the label of the program’s stop instruction. 
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We say that a computation terminates, or halts, iff it ever makes (as we say 
“reaches”) the instruction stop current. Note that the semantics of “Z : stop” 
appear to require the computation to continue ad infinitum, but it does so in a trivial 
manner where no variable changes value, and the current instruction remains the 
same: Practically, the computation is over. 

One usually gives names to URM programs, or as we just say, “to URMs”, such 
as M,N, P,Q,R,F,H,G. 


A.3.1 Definition. (Computing a Function) We say that a URM, M, computes a 
function f of n arguments provided—for some choice of variables x},...,X, of M 
that we designate as input variables and a choice of a variable y that we designate as 
the output variable—the following precise conditions hold for every choice of input 
sequence (or “n-tuple”), a1,..., @n from N: 


(1) We initialize the computation, by doing two things: 


(a) We initialize the input variables with the input values a1,...,a,. We 
initialize all other variables of M to be 0. 


(b) We next make the instruction labeled “1” current, and thus start the com- 
putation. 


(2) The computation terminates iff f(a1,...,@n) is defined, or, symbolically, iff 
“F(ai,.--;@n) |”. 


(3) If the computation terminates, that is, if at some point the instruction stop 
becomes current, then the value of y at that point (and hence at any future point, 
by (iv) above), is f(a1,...,@n). oO 


(1) The notation “f(a1,...,@,) T” means that f(a1,...,a,,) is undefined. 


(2) The function computed by a URM, M, with inputs and output designated as 
above, can also be denoted with the symbol My2**". This symbol, with no need 
for comment, makes it clear as to which are the input variables (superscript) of M, 
and which is the output variable (subscript). The variables x),...,Xp in Myo" 
are “apparent”, or not free for substitution; since My?>-“*" is not a term (in the 
predicate logic sense of the word), it does not denote a value. Note also that any 
attempt to effect such substitutions, for example, 1 blige erg ae , would lead, in general, 
to nonsensical situations like “Z : 3 — 3+ 1”, a command that wants to change the 
(standard) value of the symbol “3” (from 3 to 4)! 


Thus, we may write f = My", but not f(a1,...,@n) = Mia ay: 


Note that f denotes, by name, a function, that is, a potentially infinite table of 
input/output pairs, where the input is always an n-tuple. On the other hand, M¥2+-*= 
goes a step further: It finitely represents the table f, being able to do so because it 
is a finite set of instructions that can be used to compute the output for each input 
where f is defined. 


? 
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A.3.2 Definition. (Computable Functions) A function f of n variables 21,...,2n 
is called partial computable iff for some URM, M, we have f = My. The 
set of all partial computable functions is denoted by P. The set of all the total 
functions in ?—that is, those that are defined on all inputs from N—is the set of 
computable functions and is denoted by 7. The term recursive is used in the literature 
synonymously with the term computable. O 


Note that since a URM is a theoretical, rather than practical, model of computation 
we do not include human-computer-interface considerations in the computation. 
Thus, the “input” and “output” phases just happen during initialization—they are 
not part of the computation. That is why we have dispensed with both read and 
write instructions and speak instead of initialization in (1) of A.3.1. This approach 
to input/output is entirely analogous with the input/output convention for the other 
well-known model of computation, the Turing machine (cf. [6, 24, 31, 46, 49]). 


A.3.3 Example. Let M be the program 


l:x«¢x+l1l 
2: stop 


Then M> is the function f given for all c € N by f(x) = x +1, the successor 
function. O 


A.3.4 Remark. (A Notation) To avoid saying verbose things such as “M; is the 
function f given for all x € N by f(z) = 2 + 1”, we will often use Church’s 
A-notation and write instead “M = Az.x + 1”. 

In general, the notation “A --- .” marks the beginning “A” and the end “.” of a 
sequence of input variables “. - -”. What comes after the period “.” is the “rule” that 
indicates how the output relates to the input. The template for A-notation thus is 
t?. 


A“input”.“output-rule” 


Relating to the above example, we note that f = Ar.x + 1 = Ay.yt 1 = Az.f(z) 
is correct. To the left and right of each “=” we have the table for a function, and we 
are saying all these tables are the same. Note that z, y,z are “apparent” variables 
(“dummy”, bound) and are not free (for substitution). In particular, f = f(x) is 
incorrect as we have distinct types to the left and right of “=”: a table and a number 
(albeit unspecified number). 


Pause. Why bother with these notational acrobatics? Because well-chosen nota- 
tion protects against meaningless statements, such as 
Mx=x+t+1 (1) 


that one might make in the context of the above example. As remarked before, “Mz” 
is not a term, nor are the occurrences of x in it free (for substitution). For example, 
“M? = 4” (substituting 3 for x throughout in (1)) is totally meaningless, as it says 


1:3+3+4+1 
2: stop 
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However, MX = Ax.x + 1 does make syntactic and semantic sense; indeed it is true, 
as two tables are compared and are found to be equal! Since Ax.x + 1 = Ay.y +1 
the following three tables are identical:!4 


In programming circles, the distinction between function definition or declaration, 
AZ.f (Z), and function invocation (or call, or application, or “use’””)}—what we call a 
term, f (2), in first-order language parlance—is well known. The definition part, in 
programming, uses various notations depending on the programming language and 
corresponds to writing a program that implements the function, just as we did with 
M here. 

There is a double standard in notation, when it comes to relations. A relation R, 
in the metatheory, is a table (i.e., set) of n-tuples. Its counterpart in formal logic is a 
formula. But where in the formal theory we almost never write a formula A as A(x) 
in order to draw attention to our interest in its (free) variable x, in the metatheory 
most frequently we write a relation R as R(z,)—without A notation—thus drawing 
attention to its “input slots”, which here are x1,..., Xn (i.e., its “free variables’). 

Since stating “R(@,)”, by convention, is short for“d, € R”, we have two notations 
for a relation: Relational, i.e., R(Z,,), and set-theoretic, ie., Z, € R, both without 
the benefit of A notation. There are exceptions to this practice, for example, when 
we define one relation from another one via the process of “freezing” some of the 
original relation’s inputs. For example, writing x < y (the standard “less than” on 
N) means that both x and y are meant to be inputs; we have a table of ordered pairs. 
However, we will write Ax.z < y to convey that y is fixed and that the input is just 
x. Clearly, a different relation arises for each y; we have an infinite family of tables: 
For y = 0 we have the empty table; for y = 1 one that contains just 0; for y = 2 one 
that contains just 0, 1; etc. Oo 


A.3.5 Example. Let M be the program 


l:x«+x-1l 
2:stop 


4x means “x is input to M” and[ M | — x indicates “x is output from M”. 
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Then M> is the function Ax.x ~ 1, the predecessor function. The operation — is 
called “proper subtraction” and is in general defined by 


pe de ee a ifz>y 
0 otherwise 


It ensures that subtraction (as modified) does not take us out of the set of the so-called 
number-theoretic functions, which are those with inputs from N and outputs in N. 
O 


Pause. Why are we restricting computability theory to number-theoretic func- 
tions? Surely, in practice we can compute with negative numbers, rational numbers, 
and with nonnumerical entities, such as graphs, trees, etc. Theory ought to reflect, 
and explain, our practices, no? It does. Negative numbers and rational numbers 
can be coded by natural number pairs. Computability of number-theoretic functions 
can handle such pairing (and unpairing; decoding). Moreover, finite objects such 
as graphs, trees, and the like that we manipulate via computers can be also coded 
(and decoded) by natural numbers. After all, the internal representation of data in 
computers is, at the lowest level, via natural numbers represented in binary notation. 
Computers cannot handle infinite objects such as (irrational) real numbers. But there 
is an extensive computability theory (which originated with the work of Kleene, 
(27]) that can handle such numbers as inputs and also compute with them. But this 
is beyond our scope. 


A.3.6 Example. Let M be the program 


1:x«0O 
2: stop 


Then MX is the function Az.0, the zero function. Oo 


In Definition A.3.2 we spoke of partial computable and total computable functions. 
We retain the qualifiers partial and total for all number-theoretic functions, even 
for those that may not be computable. Thus a function is total iff it is everywhere 
defined and is nontotal (no hyphen) otherwise. The set union of all total and nontotal 
number-theoretic functions is the set of all partial functions. Thus partial is not 
synonymous with nontotal. Compare with the so-called partial order'> of discrete 
mathematics (and set theory). A partial order may be also a total (or linear) order. 


15.4 two-place (binary) relation R on a set D is a partial order iff it is irreflexive, that is, for no x can we 
have R(x, x), and transitive, that is, R(x, y) and R(y, z) imply R(x, z). Itis a total or linear order if, 
moreover, we have trichotomy: For any two elements x and y of D, R(x, y) Vz = y V Ry, 2) is true. 

If D is N. then the “less than” relation, <, is a total order on D. If D is the set of all subsets of N, then 
C (proper subset relation) is a partial but not total order on D. 
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A.3.7 Example. The unconditional goto instruction, namely, “L : goto L’”, can 
be simulated by L : if x = 0 goto L’ else goto L’. O 


A.3.8 Example. Let M be the program segment 


k-1:x<0 

k: xox] 

k+l:izez-1 

k+2:if z= O0gotok +3 else gotok 
KAS tan 


What it does, by the time the computation reaches instruction k + 3, is to have set 
the value of z to 0, and to make the value of x equal to the value that z had when 
instruction k — 1 was current. In short, the above sequence of instructions simulates 
the following sequence 

L: XZ 

L+i1:z2<-0 

D+2:... 


where the semantics of L : x + z are standard in programming: They require that 
upon execution of the instruction the value of z is copied into x, but the value of z 
remains unchanged. O 


A.3.9 Exercise. Write a program segment that simulates precisely L : x + 2; that 
is, copy the value of z into x without causing z to change as a side effect. O 


Because of the above, without loss of generality, one may assume of any input 
variable, x, of a program M that it is read-only. This means that its value remains 
invariant throughout any computation of the program. Indeed, if x is not so, a new 
input variable, x’, can be introduced as follows to relieve x from its input role: Add at 
the very beginning of A/ the (derived) instruction 1 : x — x’, where x’ is a variable 
that does not occur in M. Adjust all the following labels consistently, including, 
of course, the ones referenced by if-statements—a tedious but straightforward task. 
Call M’ the so-obtained URM. Clearly, M!%0¥!00¥n = MOY -¥e, 


A.3.10 Example. (Composing Computable Functions) Suppose that Ary. f(z, 7) 
and \7.9(2Z) are partial computable, and say f = F%-¥ while g = G2. 

Since we can rewrite any program renaming its variables at will, we assume 
without loss of generality that x is the only variable common to F' and G. Thus, if we 
concatenate the programs G and F in that order, and (1) remove the last instruction 
of G (k : stop, for some k}—-call the program segment that results from this G’, and 
(2) renumber the instructions of F as k,k + 1,... (and, as a result, the references 
that if-statements of F make) in order to give (G'F) the correct program structure, 


then, A¥z. f (9(Z), 7) = (G'F) ae Note that all non-input variables of F' will hold 0 
as soon as the execution of (G’ F) makes the first instruction of F’ current for the 


first time. This is because none of these can be changed by G’ under our assumption, 
thus ensuring that F’ works as designed. oO 
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Thus, we have, by repeating the above a finite number of times: 


A.3.11 Proposition. If \¥,,.f (§,) and AZ.9;(Z), fori = 1,...,n, are partial com- 
putable, then so is XZ.f(91(Z),-..-,9n(Z)). 


We can rephrase A.3.11, saying simply that P is closed under composition. For the 
record, we will define composition to mean the somewhat rigidly defined operation 
used in A.3.11, that is: 


A.3.12 Definition. Given any partial functions (computable or not) A¥,.f (Gn) and 
Az.gi(Z), fori = 1,...,n, we say that AZ.f(g1(Z),...,9n(Z)) is the result of their 
composition. Oo 


We characterized the definition as “rigid”. Indeed note that it requires that all 
the arguments of f be substituted by a g;(z}—unlike Example A.3.10, where we 
substituted a function invocation (cf. terminology in A.3.4) in one variable of f there, 
and did nothing with the variables y—and for each application g;(...) the argument 
list, “...”, must be the same, for example z. This rigidity is only apparent, as we 
show in examples in the next subsection (A.3.2). 


Composing a number of times that depends on the value of an input variable is 
iteration. The general case of iteration is called primitive recursion. 


A.3.13 Definition. (Primitive Recursion) A number-theoretic function f is defined 
by primitive recursion from given functions Ay.h(y) and Axij'z.9(x, 7, z) provided, 
for all x, %, its values are given by the two equations below: 


f0,¥)  =h(y) 
f(xt+1, n= 9(z,¥; F(a, 9) 


h is the basis function, while g is the iterator. 
It will be useful to use the notation f = prim(h, g) to indicate in shorthand that 
f is defined as above from h and g (note the order). Oo 


Note that f(1,¥) = 9(0,9,h(¥)), f(2.9) = 9(1,9,910,9,h(9))), £39) = 
9(2, 9,901, ¥, 9(0, ¥, A(Y)))), etc. Thus the “x-value”, 0, 1, 2, 3, etc., equals the 
number of times we compose g with itself. Hence “iteration”, i.e., composition as 
many times as an input value dictates. 


A.3.14 Example. (Iterating Computable Functions) Suppose that Argz.g(z, ¥, z) 
and Ay.h(Z) are partial computable, and say g = GY? while h = HY. 

By earlier remarks we may assume: 

(i) The only variables that H and G have in common are z, y. 

(ii) y are read-only in both H and G. 

(iii) 1 is read-only in G. 

(iv) x does not occur in any of H or G. 
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We can now argue that the following program, let us call it F, computes f defined 
as in A.3.13 from h and g, where is program H with the stop instruction 


removed, is program G with the stop instruction removed, and instructions 
have been renumbered (and if-statements adjusted) as needed: 


ps i-0 

r+i1: if x = 0 goto k+ m+ 2 else goto r + 2 
r+2: x«-x-!1 

k: ieitl 

k+1: w,-0 

k+m: Wm «0 


k+m+4+1:gotor+1 
k+m+2:stop 


The instructions w; — 0 set explicitly to zero all the variables of G’ other than 
i, z, ¥ to ensure correct behavior of G’. Note that the w; are implicitly initialized to 
zero only the first time G’ is executed. Clearly, f = FY. QO 


We have at once: 


A.3.15 Proposition. If f,9,h relate as in Definition A.3.13 and h and g are in P, 
then so is f. We say that P is closed under primitive recursion. 


A.3.16 Example. (Unbounded Search) Suppose that Axy.g(x, 7) is partial com- 
putable, and say g = G%-Y. By earlier remarks we may assume that ¥ and x are 
read-only in G and that z is not one of them. 

Consider the following program F’, where| G’ Jis program G with the stop instruc- 
tion removed, and instructions have been renumbered (and if-statements adjusted) as 
needed so that its first command has label 2. 


1's x—0 
k: if z=0 gotok+l+3else gotok+1 


k+1: w+ 0 {Comment. Setting all non-input variables to 0; cf. A.3.14.} 


k+l: w, — 0 {Comment. Setting all non-input variables to 0; cf. A.3.14.} 
k+l4+1:ixe—x+1 

k+1+2:goto2 

k+1+3: stop 
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Let us set f = FY. Note that, for any a, f(@) | precisely if the URM F, initialized 
with @ as the input values in y, ever reaches stop. This condition becomes true as 
long as the two conditions, (1) and (2), are fulfilled: 

(1) Instruction & just found that z holds 0. This value of z is the result of an 
execution of G (i.e., G’ with the stop instruction added) with input values @ in ¥ 
and, say, b in x, the latter being the iteration counter—O, 1, 2, ...—that indicates how 
many times instruction 2 becomes current, 

(2) In none of the previous iterations (with x-value < 6) did G’ (essentially, G) 
get into a nonending computation (infinite loop). 


Correspondingly, the computation of F’ will never halt for an input @ if either G 
loops for ever at some step, or, if it halts in every iteration b, but nevertheless it never 
exits with a z-value of 0. 


Thus, for all a, 


f(a) = min{x : g(x, a) = 0A (Vy)(y < x > g(y,@) |)} O 
A.3.17 Definition. The operation on partial functions g given for all @ by 
min{x : g(x, 4) = 0A (Vy)(y < x > gly, 4) |)} 
is called unbounded search (along the variable x) and is denoted by the symbol 
(ux)g(ax,@). The function A\¥.(ux) g(x, ¥) is defined precisely when the minimum 


exists. O 


The result of Example A.3.16 yields at once: 


A.3.18 Proposition. P is closed under unbounded search; that is, if \x¥.g(x, ¥) is 
in P, then so is X\¥.(ux)g(x, ¥). 


A.3.19 Example. Is the function AZ,,.z;, where 1 < i <n, in P? Yes, and here is 
a program, M, for it: 


1: w, -—0 

i z <— w; {Comment. Cf. Exercise A.3.9} 
n: Wn 0 

n+1:stop 


AE n Li = M¥ =. To ensure that 4/4 indeed has the w; as variables we reference them 
in instructions at least once, in any manner whatsoever. O 
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A.3.2. Primitive Recursive Functions 


Exercises A.3.3, A.3.6, and A.3.19 show that the successor, the zero, and the gen- 
eralized identity functions respectively—which we will often name S, Z and U? 
respectively—are in P; thus, not only are they “intuitively computable”, but they 
are so in a precise mathematical sense. We have also shown that “computability” 
of functions is preserved by the operations of composition, primitive recursion, and 
unbounded search. In this subsection we will explore the properties of the important 
set of functions known as primitive recursive. We introduce them by derivations just 
as we introduced the theorems of logic. 


A.3.20 Definition. (PR-derivations; P?-functions) A PR-derivation is a finite 
sequence of number-theoretic functions that obeys, in its step-by-step construction, 
the following requirements. At each step we may write: 

(1) Any one of Z, S, U? (for any n > O and any 0 <i <n). 

(2) AZ. f (91 (2), ..-; 9n(Z)), provided each of f,91,...,9n has already been writ- 
ten. 

(3) prim(h, g), provided appropriate h and g have already been written. Note that 
hand g are “appropriate” (cf. A.3.13) as long as g has two more arguments than h. 


A function f is primitive recursive, or a PR-function, iff it occurs in some PR- 
derivation. The set of functions allowed in step (1) are called initial functions. We 
will denote this set by Z. The set of all PR-functions will be denoted by PR. O 


A.3.21 Remark. The above definition defines essentially Dedekind’s ({8]) “recur- 
sive” functions. Subsequently they have been renamed primitive recursive allowing 
the unqualified term recursive to be synonymous with computable and apply to the 
functions of 72 (cf. A.3.2). 

The concept of a P?-derivation is entirely analogous with those of proof, formula- 
calculation, and term-calculation. Vis-a-vis proofs, derivations have the following 
analogous elements: initial functions (vs. axioms) and the operations composition 
and primitive recursion (vs. the rules of inference, Leib and Eqn). 

As was the case with proofs (1.4.8 and 4.2.9), we can cut the tail off a derivation 
and still have a derivation. Thus, a P?-function is one that appears at the end of a 
PR-derivation. 

Properties of primitive recursive functions can be proved by induction on derivation 
length, just as properties of theorems can be (and have been in this volume) proved 
by induction on the length of proofs. 

That a certain function is primitive recursive can be proved by exhibiting a deriva- 
tion for it, just as is done for the certification of a theorem: We exhibit a proof. 
However, in proving theorems we accept the use of known theorems in proofs (cf. 
2.1.6). Similarly, if we know that certain functions are primitive recursive, then 
we immediately infer that so is one obtained from them by an allowed operation 
(composition, primitive recursion, or yet-to-be-introduced derived operations). For 
example, if h and g are in PR and prim(h, g) makes sense according to A.3.13, 
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then the latter is in P72, too, since we can concatenate derivations of h and g and add 
prim(h, g) to the right end. 

In analogy to the case of theorem proving, where we benefit from powerful derived 
rules, in the same way certifying functions as primitive recursive is greatly facilitated 
by the introduction of derived operations on functions beyond the two we assumed 
as given outright (primary operations) in Definition A.3.20. 5 


A.3.22 Theorem. PR contains T and is closed under primitive recursion and com- 
position. Indeed, of all possible sets that include T and are closed under these two 
operations, PR is the smallest with respect to inclusion. 


Proof. That PR contains Z is immediate from A.3.20. Why it is closed under 
primitive recursion was outlined in Remark A.3.21 and the case of composition is 
analogous (see Exercise A.3.23). Let now S be a set that includes Z and is closed 
under the two operations. By induction on the length of derivations, we can prove 
that if f € PR, then f € S. 

For the basis, let f occur in a derivation of length 1. Then it is in Z and we are 
done. Assume the claim for all f that appear in derivations of length < n and 
consider an f that appears in one of length n + 1. If it appears before the last step, 
then we are done by the I.H. Let it then appear only at the last step. If it is in Z, 
we are done by assumption on S. Let then f = prim(h, g) where fh and g show up 
earlier in the derivation under consideration. By I.H. both # and g are in S. As this 
set is closed under primitive recursion, it contains f as well. The case of composition 
causing the presence of f in the derivation is similar (see Exercise A.3.23). Oo 


A.3.23 Exercise. Provide the missing details in the proof of A.3.22. oO 


A.3.24 Remark. (Induction on P72) Just as we do “induction on formulae” we can 
do induction on PR toward proving a property #(f) for all f € PR. We prove: 


(1) (Basis) Any one of Z, S, UP (for any n > 0 and any 0,7 < n) has the property #. 
(2) AZ.f (g1(2Z),.--, gn(Z)) has the property, provided each of f,91,..-,9n do. 
(3) prim(h,g) has the property, provided h and g do. 


The above procedure is more elegant (and more widely used) than induction on the 
length of PR-derivation and is immediately based on the latter. Alternatively, it 
directly follows from A.3.22: Let S = {f : f is number-theoretic and #(f) holds} 
where @ satisfies (1}-(G). But then S contains the initial functions and is closed 
under composition and primitive recursion (verify!). Thus PR C S, or f € PR 
implies that #(f) holds. Oo 


A.3.25 Example. If \ryw.f (x, y, w) and Az.g(z) are in PR, how about Axrzw.f (x, 
g(z), w)? Itis in PR since 


Axzw.f(a,g(z),w) = A{xzw.f (U3 (a, z,w), g(U3 (a, z,w)), U3 (2, z,w)) 
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and the U? are primitive recursive. The reader will see at once that to the right of 


66 


=” we have correctly formed compositions as expected by A.3.12. 
Similarly, for the same functions above, 


(1) Ayw. f(2, y, w) is in PR. Indeed, this function can be obtained by composition, 
since 


Ayw. f(2,y,w) = Ayw.f (S$Z(UF(y,w)),y,w) 
where I wrote “SSZ(...)” as short for S(S(Z(...))) for visual clarity. Clearly, 
using SSZ (U3 (y, w)) above works as well. 


(2) Axyw.f(y, z,w) isin PR. Indeed, this function can be obtained by composition, 
since 


Azyw.f(y,z,w) = Aeyw.f (UZ, y, w), U}(a,y, w), U3 (2, y, v)) 


© In this connection, note that while Ary.g(x, y) = Ayx.g(y, x), yetAry.g(x,y) # 
Axy.g(y, x) in general. For example, Axy.x ~ y asks that we subtract the second 
input (y) from the first (x), but Axy.y — x asks that we subtract the first input (x) 
from the second (y). 


(3) Axy.f(x,y,z) is in PR. Indeed, this function can be obtained by composition, 
since 


Ary. f(x,y, 2) = Avy.f (U?(x,y),UZ (x,y), U2 (2, y)) 


(4) Axyzwu.f (x,y, w) isin PR. Indeed, this function can be obtained by compo- 
sition, since 


Aryzwu.f(z,y,w) = 
Aryzwu.f (UP (2, ¥,%,W, u), U3 (a, Y,2,U, u), U? (2, Y,2%,U, u)) 
O 


The above are summarized, named, and generalized in the following straightfor- 
ward exercise: 


A.3.26 Exercise. (Grzegorezyk Substitution Operations [18]) PR is closed un- 
der the following operations: 


(i) Substitution of a function invocation for a variable: 


~ oo 


From AZ%yZ.f (Z, y, Z) and Aw.g(w) obtain AXWz. f (Z, g(w), Z). 


(ii) Substitution of a constant for a variable: 
From AtyZ. f (£, y, Z) obtain AZZ. f (Z, k, Z). 


(iii) Interchange of two variables: 


From AxryZwi. f(z, y, Z, w, %) obtain ATyZwii. f (Z, w, Z,y, B). 
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(iv) Identification of two variables: 
From Azyzwi.f (2, y, Z,w, a) obtain AFyZt.f (Z, y, Z, y, v). 


(v) Introduction of “don’t care” variables: 
From AZ. f (£) obtain A\FZ.f (Z). oO 


By A.3.26 composition can simulate the Grzegorczyk operations if the initial 
functions TZ are present. Of course, (i) alone can in turn simulate composition. With 
these comments out of the way, we see that the “rigidity” of Definition A.3.12 is 
gone. 


A.3.27 Example. The definition of primitive recursion is also rigid, but this rigidity 
is removable as well. For example, natural and simple recursions such as p(0) = 0 
and p(x + 1) = x—this one defining p = Ax.x ~ 1—dOo not fit the schema of 
Definition A.3.13, which requires that the defined function has one more variable 
than the basis, so it cannot have only one variable! We can get around this. Define 
first p = Axy.x ~ 1 as follows: p(0, y) = 0 and p(x + 1,y) = x. Now this can be 
dressed up according to the syntax of the schema in A.3.13, 


P(0,y) =Z(y) 
p(x + 1,y)= U3 (z,y, P(x, y)) 


that is, p = prim(Z,U}). Then we can get p by (Grzegorczyk) substitution: p = 
Ax.p(x,0). Incidentally, this shows that both p and pare in PR. 

Another rigidity in the definition of primitive recursion is that, apparently, one 
can use only the first variable as the iterating variable. Consider, for example, 
sub = Axy.x ~ y. Clearly, sub(x,0) = x and sub(z,y + 1) = p(sub(z,y)) is 
correct semantically, but the format is wrong: We are not supposed to iterate along 
the second variable! Well, define instead sub = Axy.y ~ zx: 


sub(0,y)  =Ul(y) 
sub(x + 1,y)= p(U3(z, y, sub(z, y))) 


Then, using variable swapping (Grzegorczyk operation (iii)), we can get sub: sub = 
Ary.sub(y, xz). Clearly, both sub and sub are in PR. With practice, one gets used 
to accepting at once simplified recursions like the one for p and sub. One needs to 
make them conform to the format of A.3.13 only if the instructor insists! Oo 


A.3.28 Exercise. Prove that Axy.z + y and Axy.x x y are primitive recursive. Of 
course, we will usually write multiplication z x y in “implied notation”, xy. 0 


A.3.29 Example. The very important “switch” (or “if-then-else”) function sw = 
Azyz.if x = 0 then y else z is primitive recursive. It is directly obtained by primitive 
recursion on initial functions: sw(0, y, z) = y and sw(r+1,y,z) = z. O 


A.3.30 Exercise. Dress up the recursion sw(0, y, z) = yand sw(x+1,y,z) = zto 
bring it into the format required by Definition A.3.13. oO 
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A.3.31 Exercise. Prove by induction on derivation lengths that all functions in PR 
are total. O 


A.3.32 Proposition. PR C R. 


Proof. By A.3.11, A.3.15, and A.3.22, PR C P. But all the functions in P? are 
total (cf. A.3.31 and Definition A.3.2). Oo 

Indeed, the above inclusion is proper, but the proof is beyond our scope (cf. for 
example, [49]). We also state for the record: 


A.3.33 Proposition. R is closed under both composition and primitive recursion. 


Proof. Because P is, and both operations conserve totalness. Oo 


A.3.34 Example. Consider the function ex given by 


ex(z,0) =1 
ex(z,y + 1)= ex(z, y)x 


Thus, if z = 0, then ex(x,0) = 1, but ex(z, y) = 0 for all y > 0. On the other hand, 
if x > 0, then ex(z, y) = x¥ for all y. 

Note that x is “mathematically” undefined when zx = y = 0.'6 Thus, by 
Exercise A.3.31 the exponential cannot be a primitive recursive function! 

This is rather silly, since the computational process for the exponential is so 
straightforward; thus it is a shame to declare the function non-PR. After all, we 
know exactly where and how it is undefined and we can remove this undefinability 
by redefining “x4” to mean ex(zx, y) for all inputs. 

Clearly ex € PR. We do this kind of redefinition a lot in computability in order 
to remove easily recognizable points of “nondefinition” of calculable functions. We 
will see further examples, such as the remainder, quotient, and logarithm functions. 
Caution! We cannot always remove points of nondefinition of a calculable function 
and still obtain a computable function. Oo 


A.3.35 Definition. A relation R(Z) is (primitive) recursive iff its characteristic func- 


tion, 
0. if R(zZ) 
= XZ. 
ue if -R(z) 
is (primitive) recursive. The set of all primitive recursive (respectively, recursive) 
relations is denoted by PR, (respectively, R.). Oo 


Computability theory practitioners often call relations predicates. It is clear that 
one can go from relation to characteristic function and back in a unique way, since 
R(Z) = xr(Z) = 0. Thus, we may think of relations as “0-1 valued” functions. The 
concept of relation simplifies further development of the theory of primitive recursive 
functions. 


'6In first-year university calculus we learn that “0°” is an “indeterminate form”. 


? 
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The following is useful: 


A.3.36 Proposition. R(z) € PR, iff some f € PR exists such that, for all =, 
R(£) = f(#) = 0. 


Proof. For the if-part, | want yp € PR. This is so since yp = AX.1 + (1 + f(Z)) 
(using Grzegorczyk substitution and Axy.z ~ y). For the only if-part, f = xR will 
do. O 


A.3.37 Corollary. R(Z) € R. iff some f € R exists such that, for all Z, R(Z) = 
f(#) =0. 


Proof. By the above proof, A.3.32, and A.3.33. D 
A.3.38 Corollary. PR, C R,. 
Proof. By the above corollary and A.3.32. D 


A.3.39 Theorem. PR, is closed under the Boolean operations. 


Proof. It suffices to look at the cases of — and V. 


(=) Say, R(Z) € PR.. Thus (A.3.35), xr € PR. But then xr € PR, since 
Xr = A£.1 ~ yr(Z), by Grzegorczyk substitution and Ary.z ~ y € PR. 


(V) Let R(Z) € PR, and Q(y) € PR,. Then AZ¥.x Rva(Z, 9) is given by 
Xrvg(%, 9) = if R(Z) then 0 else xQ(¥) 
and therefore is in PR. O 


It is common practice to use R(Z) and yz(Z) (almost) interchangeably. For example, 
“if R(Z) then ...” is the same as “if xp(Z) = 0 then ...”. The latter more directly 
shows that a (Grzegorczyk) substitution was effected into an argument of the if-then- 
else (A.3.29) function: 


xr(Z) 
dt 
if «x =Othen... 


thus establishing the primitive recursiveness of the resulting function. 


A.3.40 Remark. Alternatively, note that xrvo(Z, 7) = xr(Z) x xq(y). O 


A.3.41 Corollary. R., is closed under the Boolean operations. 
Proof. As above, mindful of A.3.32, and A.3.33. O 


A.3.42 Example. The relations z < y, x < y, z = y are in PR,. See A.3.4 fora 
refresher on our conventions regarding lambda notation and relations. 
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With this out of the way, note thatz < y = x ~ y = Oand invoke A.3.36. Finally 
invoke Boolean closure and note that z < y = 7y < z while x = y is equivalent to 
a<yAy<az. O 


A.3.43 Proposition. [f R(Z,y,Z) € PR. and AW.f(w) € PR, then R(zZ, f(w), Z) 
is in PR,. 


Proof. \f Q(z, w, z) denotes R(Z, f(w), Z), then xQ(Z, w, Z) = xr(Z, f(w), Z). 
oO 


A.3.44 Proposition. /f R(Z,y, Z) € R. and AW.f (w) € R, then R(z, f(w), Z) is 
in Ry. 


Proof. Similar to that of A.3.43. oO 


A.3.45 Corollary. if f € PR (respectively, in R), then its graph, z = f(z) is in 
PR. (respectively, in Rx). 


Proof. Using the relation z = y and A.3.43. O 


The following converse of A.3.45, “if z = f(Z) is in PR, and f is total, then 
f € PR” is not true. A counterexample is provided by the Ackermann function.'’ 
However, “if z = f(£) isin R, and f is total, then f € R” is true. Cf. [49] and the 
exercise below. 


A.3.46 Exercise. Using unbounded search, prove that if z = f(Z) isin R, and f is 
total, then f € 7. O 


A.3.47 Definition. (Bounded Quantifiers) The shorthand notations (Vy) <zR(z,Z) 
and (3y)<,R(z,#) stand for (Vy)(y < z — R(z,Z)) and (Jy)(y < 2A R(z,zZ)) 
respectively, and similarly for the nonstrict inequality “<” (cf. the general case on 
p. 175). o 


A.3.48 Theorem. PR. is closed under bounded quantification. 


Proof. By A.3.39 it suffices to look at the case of (Sy) <, since (Vy)<zR(y,Z) = 
A(Sy)<27R(y, Z). 

Let then R(y,Z) € PR, and let us give the name Q(z,Z) to (Sy)<2R(y,z). We 
note that Q(0, Z) is false (why?) and Q(z + 1,Z) = Q(z,Z) V R(z,z). Thus, 


xglz+ 1,2) = xQ(z,Z%)xR(z, 2) O 


'7 The so-called Ritchie version of the Ackermann function Anz. A,,(a) is given by a “double recursion” 
for all n, 2: Ao(x) = x + 2, An41(0) = 2, Angi(z +1) = An(Ani1(2)). 


4 
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A.3.49 Corollary. Ris closed under bounded quantification. 


A.3.50 Exercise. The operations of bounded summation and bounded multiplication 
are quite handy. 

These are applied on a function f and yield the functions Azz. 0; -, f(t, Z) and 
Az. [],<, f(t, £) respectively, where )° 5-9 f(i,Z) = 0 and []; <9 f(t, Z) = 1 by 
definition. Prove that PR and F are closed under both operations; i.e., if f is in PR 
(respectively, in R), then so are AzZ. 0, -, f(t, Z) and Azz. [],., f (i, Z). o 


A.3.51 Definition. (Bounded Search) Let f be a total number-theoretic function of 
n+ 1 variables. The symbol (yy) <2 f(y, £), for all z,Z, stands for 


min{fy:y<zAf(y,f)=0} if (Ay)<zf(y,%) =0 
z otherwise 


We define “(jy)<2” to mean “(py)<241”. 0 


A.3.52 Theorem. PR is closed under the bounded search operation (py) <2. That 
is, if AyZ.f(y, Z) € PR, then Xzz.(py)<z f(y, £) € PR. 


Proof. Set g = AzZ.(uy)<zf (y, £). Then the following primitive recursion settles 
it: 


g(0,z) =1 
g(z + 1, Z) =if g(z,Z) < z then g(z,Z) 
else if f(z,Z#) = 0 then z 
else z + 1 O 


A.3.53 Exercise. Mindful of the comment following A.3.39, dress up the primitive 
recursion that defined g above so that it conforms to the Definition A.3.13. O 


A.3.54 Corollary. PR is closed under the bounded search operation (jy) <z- 
A.3.55 Exercise. Prove the corollary. D0 


A.3.56 Corollary. R is closed under the bounded search operations (wy)<z, and 
(uy) <2. 


Consider now a set of mutually exclusive relations R;(Z), i = 1,...,n, that is, 
R,(Z) A R; (2) is false for each Z as long as i F j. 
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Then we can define a function f by cases R; from given functions f; by the 
requirement (for all z) given below: 
fi(Z) if Ri(Z) 
J2(z) if Ro(z) 
frn(Z) if Rz (2) 
fn4i(Z) otherwise 


where, as is usual in mathematics, “if Rj(z)” is short for “if R;(Z) is true” and 
“otherwise” is the condition ~(R,(£) V--- V R,(Z)). We have the following result: 


A.3.57 Theorem. (Definition by Cases) If the functions f;, i = 1,...,n +1 and 
the relations R;(z), 1 = 1,...,n are in PR and PR, respectively, then so is f 
above. 


Proof. Either by repeated use (composition) of if-then-else or by noting—mindful 
of A.3.39—that 


F(#) = fa(Z)Q = xr, (£))+ --+ + fa(Z)(1 + xn, (Z))+ 
Fna1(E)(L + xXacayv---vR,)(€)) O 


A.3.58 Corollary. Same statement as above, replacing PR and PR, by R and R, 
respectively, 


The tools we now have at our disposal allow easy certification of the primitive 
recursiveness of some very useful functions and relations. But first a definition: 
A.3.59 Definition. (jy) <, R(y, £) means (wy) <zxR(y, =z). O 

Thus, if R(y,Z) € PR. (resp. € Re), then Azzr.(uy)<,R(y,£) € PR (resp. 

E R), since xr € PR (resp. € FR). 
A.3.60 Example. The following are in PR or PR, as appropriate: 


(1) Ary. H 18 (the quotient of the division x/y). This is another instance of 


a nontotal function with an “obvious” way to remove the points where it is 
undefined. Thus the symbol is extended to mean (yz) <z((z + 1)y > a) for 
all x, y. It follows that, for y > 0, |/y| is as expected in normal math, while 
[x/O| =a2+1. 


(2) Axy.rem(x, y) (the remainder of the division z/y). rem(z,y) = a ~ yla/y|. 


'8The symbol “|z}” is called the floor of x. It succeeds in the literature (with the same definition) the 
so-called “greatest integer function. [x]”. i.e., the integer part of the real number z. 
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(3) Ary.zly (x divides y). z|y = rem(y, x) = 0. Note that if y > 0, we cannot 
have 0|y—a good thing!—since rem(y,0) = y. Our redefinition of |x/y| 
yields, however, 0|0, but we can live with this in practice. 


(4) Pr(z) (visa prime). Pr(z) =< >1A (Vy)<z(ylz —- y= 1Vy=2). 


(5) m(x) (the number of primes < z).!° The following primitive recursion certifies 
the claim: (0) = 0, and x(x + 1) = if Pr(z + 1) then x(x) + 1 else x(z). 


(6) An.p, (the nth prime). First note that the graph y = p,, is primitive recursive: 
Y = Pn = Pr(y) Ax(y) = n+ 1. Next note that, for all n, p, < 22” (see 
Exercise A.3.61 below), thus py, = (j4y)<22" (y = pn), which settles the claim. 


(7) Anz. exp(n, x) (the exponent of p,, in the prime factorization of z). exp(n,z) = 
(uy) <27(ph*"|z). 
(8) Seq(x) (x’s prime number factorization contains at least one prime, but no gaps) 


Seq(xz) =2>1A (Vy)<r(Vz)<2(Pr(y)A Pr(z) Ny <zAz|z—ylz). O 


A.3.61 Exercise. Prove by induction on n, that for all n we have pp, < 22”. 

Hint. Consider, as Euclid did,”° pop, --- py + 1. If this number is prime, then it is 
greater than or equal to p,41 (why?). If it is composite, then none of the primes up 
to p, divide it. So any prime factor of it is greater than or equal to p,+, (why?). O 


A.3.62 Exercise. Prove that \z.|log, z| € PR. Remove the undefinedness atz = 0 
in some convenient manner. 0 


A.3.63 Definition. (Coding Sequences) Any sequence of numbers, ao,...,@n,n > 
0, is coded by the number (ap, ... , ay) defined as 


Io" oO 


i<n 


For coding to be useful, we need a simple decoding scheme. Define then the 
expressions: 


(i) (z); as shorthand for exp(i, z) + 1 
(ii) [h(z) (pronounced “length of z”) as shorthand for (uy)<z-(py|z) 
Note that 


(a) Aiz.(z); and Az.lA(z) are in PR. 


The z-function plays a central role in number theory figuring in the so-called prime number theorem. 
See for example, [30]. 
In his proof that there are infinitely many primes. 
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(b) If Seq(z), then z = (ao,...,@n) for some ao, ...,@n- In this case, [h(z) equals 
the number of distinct primes in the decomposition of z, that is, the length n + 1 
of the coded sequence. Then (z);, fori < lh(z), equals a;. For larger 7, (z); = 0. 
Note that if -Seq(z) then Jh(z) need not equal the number of distinct primes in 
the decomposition of z. For example, 10 has 2 primes, but /h(10) = 1. 


The tools lh, Seg(z), and Aiz.(z); are sufficient to perform decoding, primitive re- 
cursively, once the truth of Seq(z) is established. This coding/decoding is essentially 
that of Gédel’s ([16]) and we will use it in the following subsection. 


We conclude this subsection with a flexible (seeming) extension of primitive 
recursion. Simple primitive recursion defines a function “at n + 1” in terms of its 
value “at n”. However we also have examples of “recursions” (or “recurrences”), one 
of the best known perhaps being the Fibonacci sequence, 0, 1,1, 2,3,5,8,..., that is 
given by Fo = 0, F, = land (forn > 1) Fai = Fat Fy-1, where the value atn+1 
depends on both the values at n and n — 1. This generalizes to recursions where the 
value atn. +1 depends on the entire history, or course-of-values, of the function values 
at n,n — 1,n — 2,...,1,0. Compare with the contrast between simple induction 
and “strong” induction (cf. “A crash course on induction”, p. 17). The easiest way 
to utilize the course of values of a function f: f(0,y¥), f(1,y),..-,f(z,¥) prior to 
z+ 1, or “at x”, is to code it by a single number! 


A.3.64 Definition. (Course-of-Values Recursion) We say that f, of n + 1 argu- 
ments, is defined by a basis function A¥,,.b(y,,) and an iterator Axij,2z.9(z, ¥,, z) by 
course-of-values recursion if for all x, ¥, the following equations hold 


£(0,%n) = b(n) 
F(2+1,9n)= 9(2,9n, A(z, Gn) 


where Azy,.H(z, y,,) is the history function, which “at x” is given (for all ¥,,) by 
(f(0,9),f(1,9),---s F(z, 9) O 


The major result here is: 


A.3.65 Theorem. PR is closed under course-of-values recursion. 


Proof. So, let b and g be in PR. We will show that f € PR. It suffices te prove that 
the history function H is primitive recursive, for then f = Axy,.(H(x,9n)), and 
we are done by Grzegorczyk substitution. To this end, the following equations—true 
for all x, #,,— settle the case: 


(0,9) = (b(Yn)) 
H(x+1,9n) = H(x,tn)psgie Mor o 


A.3.66 Example. The Fibonacci sequence, (F;,),>0, can be viewed as the function 
An.F,,. As such it is in PR. Indeed, letting H,, be the history of the sequence 


? 
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at n—that is, (Fp,...,/,)—we have the following course-of-values recursion for 
An.F,, in terms of functions known to be in PR. 


Fy =0 
Fa41 = ifn = 0 then 1 
else (Hn), + (Hn), -, Oo 


A.3.3. URM Computations 


Asan “agent” executes some URM’s, M, instructions, it generates at each step instan- 
taneous descriptions (IDs) of a computation. The information each such description 
includes is simply the values of each variable of M, and the label (instruction number) 
of the instruction that is about to be executed next—the current instruction. 

In this subsection we will arithmetize URMs and their computations—just as 
Gédel did in the case of formal arithmetic and its proofs ({16])—and prove a comer- 
stone result of computability, the “normal form theorem” of Kleene that, essentially, 
says that the URM programming language is rich enough to allow us write a univer- 
sal program for functions. Such a program, U, receives two inputs: One is a URM 
description, M, and the other is “data”, x. U then simulates M on the data, behaving 
exactly as M would on input x. Programmers may call such a program an interpreter 
or compiler. 

Toward this end, we will first define a relation T(z, x, y)—known in the literature 
as the Kleene T-predicate—that is true iff the URM coded by z, when it receives 
input x in its variable X1, will have a terminating computation that is coded by y. 
Thus we turn to coding URMs and URM computations. 


Normalizing Input/Output: There is clearly no loss of generality in assuming. 
that any URM that computes a unary function does so using X1 as input and output 
variable. Such a URM will have at least two instructions, since the stop instruction 
does not reference any variables. 


We arithmetize or code a URM M as a sequence of numbers—coded by a single 
number as in A.3.63—where each number of the sequence is the code of some 
instruction. 


A.3.67 Definition. (Codes for Instructions) The instructions are coded as follows, 
where X 1? is short for 


7 ones 
“_~ 
Xi1---1 


(1) L: X1*— ahas code (1, L,i, a). 

(2) L: X1*— X1* +1 has code (2, L, i). 

(3) L: X1'— X1* = 1 has code (3, L, i). 

(4) L: if X1* = 0 goto P else goto F has code (4, L, i, P, R). 
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(5) L: stop has code (5, L). Oo 


The first component of each instruction code z, (z)g denotes the instruction type, 
the second—(z)—the label, and the remaining components give enough information 
for us to uniquely know what precise instruction we are talking about. For example, 
in z = (3, L, 7) we read that we are talking about the “decrement by one” instruction 
((z)o = 3) applied to X1* ((z)2 = 4), which is found at label L ((z), = L). 

In turn, we code a URM M as an ordered sequence of numbers, each being a code 
for an instruction. Thus given a code z (i.e., z codes something: Seq({z) holds) we 
can determine algorithmically whether z codes some URM. More precisely: 


A.3.68 Theorem. The relation U RM (z) that holds precisely if z codes a URM is in 
PR,. 


Proof. In what follows we employ shorthand such as (4z, w) <u for (Az) <u(Sw) <u, 
and similarly for longer quantifier groupings and for V. 
URM(z) = Seq(z) A (2) ayn. = (5,Uh(z))"! A 
(Vi) <in(z){t # Lh(z) — 1 — ((z)i)o #5) A 
(VL) <in(z) (Sea((z)z) A 
[i a) ¢-(2)z =(1,L+1,i+ La)v 
(Fi)<o{(2)z = (2,L+1,it+1)v 
(2), = (3, L+1iti)v 
(4M, R) <incz)(2)b 
=(4,L+1i+1,M+1,R+1)})) oO 


A.3.69 Definition. An ID of a computation of a URM M is an ordered sequence 
L,a,,...,@,, where all of M’s variables are among the X1, X11,...X1" that we 
will denote by x1,...,x;, and a; is the current value of x; immediately before 
instruction L is executed. L points precisely to the current instruction, meaning the 
next to be executed. 

All IDs have the same length, and we say that ID J; = L;a1,...,a, yields ID 
In = P;b1,...,6,, in symbols J; | Ie, exactly when 


(i) L labels “x; <— c”, and J; and J, are identical, except that b; = cand P = L+1. 


21Note that z = ((z)o,..., (z):n¢2) 1)» but we have used positive labels; thus the last label is Lh(z). 


Similar comment about “(3#, a)<2(z), = (1, +1,i+1, a)”, etc. Why z+ 12 Because the variables 
are X1,X11,X111,... 
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(ii) L labels “x; — x; +1”, and J, and J, are identical, except that 6; = a; + 1 
andP=L£+41. 


(iii) L labels “x; — x; ~ 1”, and J; and J are identical, except that b; = a; + 1 
and P=L +1. 


(iv) L labels “if x; = 0 goto R else goto Q”, and I, and I are identical, except that 
P = Rif a; = 0, while P = Q otherwise. 


(v) L labels “stop”, and I, and [2 are identical. 


A terminating computation of M with input a1,..., a4 isa sequence I,,..., I, such 
that for all 7 < n we have J; + J;,1 and for some 7 < n, I, has as Oth member the 
label of stop. Moreover, J; is initial; that is, if the input variables of M are, without 


loss of generality, the variables x1,...,X,, then 
I, = 1lja,...,a,,0,...,0 O 
—— 
r—kOs 


We code an ID I = L;a1,...,a, as code(I) = (L,a1,...,a,) and a terminating 
computation I1,..., In by (code(I,),...,code(In)). 


A.3.70 Theorem. The relation Comp(z,y) that is true iff y codes a terminating 
computation of the URM coded by z, which has “X1” as its only input variable (cf. 
preambular remarks of subsection A.3.3, p. 254), is primitive recursive. 


Proof. By the remark on p. 254 that normalizes the input/output convention, it 
must be that /h(y) > 2. Definition A.3.69 allows us the technical convenience to 
include into an ID enough room for more variables than may actually be present in 
the URM (that is coded by) z. Clearly (cf. A.3.67), a generous allowance for the 
length of an ID, as this is determined by the largest 7 such that X14 occurs in z, is 
max{(z); : i < [h(z)}. Even simpler (and more generous) is just z, which is the one 
we will work with. Thus, our IDs will each have length z + 1, allowing for the label 
information. Observe next that 


Comp(z,y) = URM(2) A Seq(y) A (i) <inmyy [Sea((y)i) A ta((y)i) = 2 +1] A 
Ih(y) > 1A (WA) ctntyy =r¥ieldl2, (¥)3,(u)s41) A 

{Comment. The last ID surely has the label of z’s stop. } ((Y)in(yyz1)o = LA(z) A 

{Comment. The initial ID.} ((y)o)o = 1 A (Vi)<z(1 <i ((y)o)i = 0) 

The relation “yield(z, (y);,(y)j+41)” above says “URM z causes (y); + (y);41”. 


The notation “yield(z, u,v)” is thus shorthand that expands as follows (cf. A.3.69 
and A.3.67): 
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yield(z, u,v) = (Ak) <2(AL) ctnie) (Z +1= (uo Ak>O0A { 


(Aa)<z((z)~z = (1,L +1,k,a) Av= ptt | /pexPte) 22) V 
z)p = (2,L+1,k) Av = 2pgu) V 
z)t = (3,L+1,k) Av = 2(if (u), = 0 then u else |u/px})) V 
(AP, R)<inczy((z)z = (4,2 +1,k,P,R)AP>OAR>OA 
v = if (u)x = 0 then |u/2’+? J2P+! 
else [u/2*?)2"+1) v 


(2). = (5, +1) Av=u)}) 0 


A.3.71 Corollary. (The Kleene 7'-predicate) The Kleene predicate T(z, x,y) that 
is true precisely when the URM z with input x has a terminating computation y, is 
Primitive recursive. 


Proof. By earlier remarks, T(z, x,y) = Comp(z,y) A ((y)o), = 2- 5 


Noting that for any predicate R(y,Z), (uy)R(y,Z) is alternative notation for 
(uy)xr(y, £), we have: 


A.3.72 Corollary. (The Kleene Normal Form Theorem) 


(1) For any URM M, if z is its code (A.3.67), then we have M$} is defined on input 
& iff (Ay)T(z,2,y). 


(2) There is a primitive recursive function d such that for any \x. f(x) € P there is 
anumber z and we have for all x: 


f(x) = d((uy)T(z, 2, y)) 


Proof. Statement (1) is immediate as “(4y)7'(z, 2, y)” says that there is a terminating 
computation of 1 (coded as z) on input z. 

For (2), we first remark that “=” means that the two sides are both undefined, 
or both defined and numerically equal. Now, the role of d is to extract from a 
terminating computation’s /ast ID its Ist component—recalling our definition, A.3.2, 
and our convention that unary functions are computed, without loss of generality (cf. 
p. 254), as Mx, for various M. Thus, for all y, d(y) = ((y):m(y)71), willdo. O 


22The effect of “L +1: X1* — a” onIDu = (L+1,...) is to change L + 1 to L + 2 (effected by 
the factor 2) and change the current value of X1*, (u),—stored in the ID as a factor pote! a factor 


that we remove by dividing u by it—to a, this being stored in v as a factor Pee 
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A.3.73 Remark. The normal form theorem says that every computable unary func- 
tion, in the technical sense of A.3.2, can be expressed as an unbounded search fol- 
lowed by a composition, using a toolbox of just two primitive recursive functions(!): 
dand Azxry.x7(z, x,y). This representation, or “normal form”, is parametrized by 
z, which denotes a URM M that computes the function in a normalized manner: as 
Mj. Thus what we set out to do at the beginning of this section is done: The two- 
input URM U that computes Azx.d((yy)T'(z, x, y)) clearly a computable function 
this, by closure properties of P (A.3.11 and A.3.18) and A.3.32—is universal, just 
as compilers are in computing. U accepts as inputs a program M coded as a number 
z, and data for said program, z. It then acts exactly as program z would on 2, i.e., as 
MX}. q 


A.3.74 Definition. (Rogers’s ¢-Notation ([41]) We denote by ¢, the zth partial re- 
cursive function, in the sense that, for all z, 6, = Ax.d((uy)T(z, 2, y)). O 


A.3.75 Remark. (1) From Definition A.3.67 it is clear that not every z € N repre- 
sents a URM. Nevertheless, Definition A.3.74 indexes all partial computable func- 
tions using all numbers from N as indices, not just those z that represent URMs. 
This is so because the term “d((uy)T (z,2,y))” is meaningful for any z. Thus, if 
z is not a URM code, then 7'(z, z, y) will simply be false for all x and all y; thus 
¢z(x) { for all x. This is fine! Indeed it is consistent with the phenomenon where a 
real-life computer program that is not syntactically correct (like our z here) will not 
be translated by the compiler and thus will not run. Therefore, for any input it will 
decline to offer an output; the corresponding function will be totally undefined. 

(2) Definition A.3.2, for the unary case, can now be rephrased as “Az. f(z) € P 
iff, for some z € N, f = $,”. We say that z is a “d-index” of f. O 


A.3.76 Exercise. Prove that every function of P has infinitely many ¢-indices. 
Hint. There are infinitely many ways to modify a program and yet have all 
programs so obtained compute the same function. Oo 


A.3.77 Example. The nowhere-defined function can also be obtained from a program 
that compiles all right. Setting S = Ayx.x + 1 we note: 

(1) Az.(py)S(y, z) € P by A.3.32 and A.3.18. 

(2) By the techniques of A.3.16 we can write a program for dz. (uy) S(y, x). 

As a side-effect we have that PR # P and R # P. O 


A.3.78 Exercise. (URM-independent Characterization of P) Define the concept 
of P-derivations as in A.3.20, however, adding a 4th case of what-we may write at 
each step: We may also write AZ.(yy) f (y, £), if Ay. f(y, Z) is already written. 

Prove: 

() f € P (as in Definition A.3.2) iff f appears in some P-derivation. 

(2) Of all possible sets of functions that include Z and are closed under primitive 
recursion, composition and unbounded search, P is the smallest with respect to 
inclusion. O 
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A.3.79 Remark. (Case of Many Inputs) It has been convenient to present the nor- 
mal form theorem, in particular the Kleene predicate, in terms of unary (1-ary) func- 
tions. {t is good to know that in doing so, in the presence of coding, we did not restrict 
generality at all. Indeed, if AZ,,.f(Z,) € P, then so is 9 = Az.f((z)o,---, (z)n) 
by composition. Thus, every n-input computable function is expressible via coding 
through a unary computable function: For all Z,, we have f (Zn) = 9((Zn)). 

In particular, if g = ¢;, then for all Zn, we have f(Zn) = ¢:((Z,)) and hence 
(by A.3.72) f(Zn) = d((uy)T (i, (En), y)). We call ia ¢-index of f. 

Itis customary in the literature to introduce the notations (forn > 1) T(z, Zn, y), 
the n-input Kleene predicate, as shorthand for 7'(z, (Zn), y), and a” as shorthand 
for AZn-da ((za)): Thus the n variables normal form theorem can be expressed as: 
For all Zn, 6”) (Zn) = d((py)1"™ (z,Zn,y)). o 


A.3.4 Semi-computable Relations; 
Unsolvability 


We next define a P-counterpart of R, and PR, and look into some of its closure 
properties. 


A.3.80 Definition. (Semi-computable Relations) A relation P(Z) is called semi- 
computable iff for some f € P, we have, for all z,,, 


P(En) = f(En) | (1) 


The set of all semi-computable relations is denoted by P,.23 


If f = a” in (1) above, then we say that “a is a semi-computable index or just a 
semi-index of P(z,)”. If n = 1 (thus P C N) and a is one of the semi-indices of P, 
then we write P = W, ([41)). O 


We have at once: 


A.3.81 Theorem. (Normal Form Theorem for Semi-Computable Relations) 
P(En) € P, iff for some a € N, we have (for all Z,) P(Zn) = (Sz)? (a, Lint): 


Proof. Only if-part. Let P(Zn) = f(Zn) |, with f € P. By A3.79 f = o” for 
some a € N. 
If-part: By A.3.80 and A.3.79, P(Zn) = $8") (Zn) |. But da € P. oO 


Rephrasing the above (hiding the “a” and remembering that PR, C R.), we 
have: 


A.3.82 Corollary. (Strong Projection Theorem) P(z,) € P, iff, for some recur- 
sive predicate Q(En, z), we have (for all Z,) P(Zn) = (Az)Q(Zn, z). 


23We are making this symbol up. It is not standard in the literature. 
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Proof. For the only if, take Q(Zn, z) to be AZ,2.T™ (a, Z,, z) for appropriate a € N. 
For the if,, take f = AZ,.(4z)Q(Zn, z). Then f € P and P(z,) = f(Zn) |. | 


A.3.83 Remark. (Deciders and Verifiers) A computable relation P(z,,) is, by def- 
inition, one for which yp € 7; thus it has an associated URM M that decides 
membership of any G,, in P:4 “yes” (output 0) if it is in, “no” (output 1) if it is not. 
Thus this M is a decider for P(Z,). 

A semi-computable relation Q(Z,,,), on the other hand, comes equipped only with 
a verifier, i.e., a URM N that verifies a, € Q, if true, by virtue of halting on input 
Gm. 

While mathematically speaking Gy, ¢ Q is also “verified” by virtue of looping 
forever on input Gj, algorithmically speaking this is no verification at all as we do 
not have a way of knowing whether N is looping forever as opposed to being awfully 
sluggish and being about to halt in a couple of trillion years (cf. halting problem 
A.3.88). 

In the algorithmic sense, a verifier (of a semi-computable set of m-tuples) verifies 
only the “yes” instances of questions such as “Is Gm € Q?” QO 


Clearly, though, if we have a verifier for a relation Q(Z,,) and also have a verifier 
for its complement —Q(Z,,), then we can build a decider for Q(Zn): On input 4, 
we simply run both verifiers simultaneously. If the one for Q halts, we print 0 and 
stop the computation; if the one for —Q halts, we print 1 and stop. This computes 
xq(G,). Put more mathematically, 


A.3.84 Proposition. /f Q(Z,,) and -Q(Z,,) are in P,, then both are in R.. 


Proof. Let i and j be semi-indices of Q and —Q respectively, that is (A.3.81), 


Define 
9 = AFn-(pz)(T™ (i, Zn, 2) VT (5, Zn, 2) 


Trivially, g € P. Hence, g € FR, since it is total (why?). We are done by noticing 
that Q(z,,) = 1” (i, Zn, 9(Zn)). By closure properties of R, (A.3.41), -Q(Z,) is 
in R,, too. im) 


A.3.85 Proposition. R. C P.. 


Proof. Let Q(Z) € FR, and y be a new variable (other than any of the z). By 
“t+ A = (Ax)A if x is not free in A” (cf. p. 188, Exercise 11) and soundness, we have 
Q(Z) = (Sy)Q(z) is true in the metatheory (where we are developing this section 
on computability). By A.3.82, Q(z) € Py. QO 


24eq, € P” (set notation) is synonymous with “P(@n) holds” or just “P(@,)” (relational notation). 
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A.3.86 Definition. (Unsolvable or Undecidable Problems) A problem is a ques- 
tion “Z, € R?’ for some set of n-tuples R. “The problem Z, € R is recursively 
unsolvable’, or just unsolvable, or undecidable, means that the set R—equivalently, 
the relation Z,, € Ror R(Z,)—is not in R,. Put colloquially, there is no URM- 
programmable solution for the problem; there is no decider for the question “Is 
E, € R?”. 

The halting problem has central significance in computability. It is the question 
whether “program z will ever halt if it starts computing on input x”. That is, if we set 
K = {x: $,(2) |}, then the halting problem is z € K. We denote the complement 
of K by K. oO 


A.3.87 Exercise. The halting problem x € K is semi-recursive. 
Hint. The problem is “¢,(z) |”. Now invoke the normal form theorem 
(A.3.72(1)). O 


A.3.88 Theorem. (Unsolvability of the Halting Problem) The halting problem is 
unsolvable. 


Proof. In view of the preceding exercise (and A.3.84), it suffices to show that K 
is not semi-computable. Suppose instead that 7 is a semi-index of the set. Thus, 
x € K = (Az)T (i, z, z), or, making the part x € K—that is, ¢, (7) t—explicit: 


3(4z)T (zx, x, z) = (Az)T (i, 2, z) (1) 
Substituting 7 into x in (1) we get a contradiction. O 


A.3.89 Remark. (1) Since K € P,, we conclude that the inclusion R, C P, 
(A.3.85) is proper, i.e., R. C Py. 

(2) The characteristic function of K provides an example of a total uncomputable 
function. 

(3) In A.3.34 we saw an example of how to remove “points of nondefinition” 
from a function so that it remains computable but has been now extended to a total 
function. Can we always do that? No; for example, the function f = Azxv.¢,(x) + 1 
cannot be extended to a total computable function. Of course, by A.3.72, f € P. 
Here is why: Suppose that g € R extends f. Thus, g = ¢; for some 7. Let us look 
at g(i): We have 


a = AG i(a)+1 = a 
9( ) gt ( I On ) eae ) 
But since f(z) |, we also have g({i) = f(z) as g extends f; a contradiction. O 


A.3.90 Theorem. (Closure Properties of P,) P. is closed under V, A, (Ay)<z; 
(Ay), and (Vy) <z. It is not closed under either — or (Vy). 


Proof. Given semi-computable relations P(Z,,), Q(Ym) and R(y, d,) of semi-indices 
p,q, respectively. In each case we will express the relation we want to prove semi- 
computable as a strong projection (A.3.82): 
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V 
P(€n) V Q(%m) = (Az)T™ (p, Zn, 2) V (Az)T™ (4, Fins 2) 
= (3z)(T™ (p, Zn, 2) VT™ (4, 9m 2)) 
A 
P(Zn) \Q(Hm) = (Az)T™ (p, Zn, 2) A (3z)T™ (4, Fins 2) 
= (Sw) (Jz) <w1™ (p, En, 2) A (Az) ew (4, Fn 2)) 
(Ay)<z 
(Ay) <2R(y, de) = (Jy) <2(Jw)T (7, y, He, w) 
= (Jw) (Ay) <2T 4 (7, y, the, w) 
(2y) 
(Sy) R(y, t.) = (Ay) (Jw) TO) (r, y, ie, w) 
= (3z)(3y) <2 (Jw) <2T** (ry, dk, w) 
(Vy)<z 


(Vy)<2R(y, te) = (Vy) <2(Jw)T“* (r, y, tte, w) 
= (Av) (Wy) <2 (Sw) <yT At) (r, y, uk, w) 


As for possible closure under — and Vy, K provides a counterexample to =, and 
aT (x, x, y) provides a counterexample to Vy. O 


A.3.91 Remark. (Computably Enumerable Sets) There is an interesting charac- 
terization of nonempty semi-computable sets that is found in all introductions to 
the theory of computation. These sets are precisely those that can be “enumerated 
effectively” or “computably”, that is, 

A nonempty set S © N is semi-computable iff some f € PR has S as its set of 

outputs, or range as we say technically. 

Indeed, assume first that, for some semi-index 7, x € S = (Sy)T (2, 2, y) forall x. 
Intuitively now, we can enumerate all pairs x, y of numbers, coded as “(x, y)”, and 
for every pair that satisfies T'(7, x,y) output x. 

Rigorously, this is accomplished by f given for all z as follows: 


Foye i if T(t, (z)o, (z)1) 
a otherwise 


where “a” is some fixed member of S that we keep outputting every time the condition 
“1'(i, x, y)” fails,”> ensuring that f is total. I wrote x for (z)g and y for (z), to connect 
with the preceding intuitive construction. Of course f is primitive recursive. 


25 Because either we did not let the computation ¢;(z) to go on long enough, or no terminating computation 
exists. 
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Conversely, if S is the range of some primitive recursive g, that is, c € S = 
(Ay)g9(y) = x holds for all 2, we immediately get that S is semi-recursive by A.3.82, 
since the graph of g, g(y) = zx, is in PR, (cf. A.3.45). 

This result justifies the nomenclature computably enumerable (c.e.) and also recur- 
sively enumerable (r.e.) for all semi-computable sets (the nomenclature applies to the 
empty set as well on the understanding that its members can trivially be enumerated 
by doing nothing). There is no loss of generality in presenting the characterization 
for subsets of N since via coding (...) it can be trivially and naturally extended to 
sets of n-tuples for n > 1. O 


A.3.92 Exercise. Prove that if \x.f(£) € P, then its graph y = f(Z)isinP,. O 
A.3.93 Exercise. Prove that if y = f(Z) isin P,, then AZ. f(z) € P. Oo 


A.3.94 Exercise. (Definition by Positive Cases) Consider a set of mutually exclu- 
sive relations R;(Z), 7 = 1,...,n, that is, Rj(Z) A R;j(Z) is false for each < as long 
asi # j. 

Then we can define a function f by positive cases R; from given functions f; by 
the requirement (for all z) given below: 


fi(z) if Ri(z) 
fo(@) _ if Ro(Z) 


fn(f) if Rn(£) 
tT otherwise 


Prove that if each f; is in P and each of the R;(Z) is in P,, then f € P. 
Hint. Use A.3.92 and A.3.93 along with closure properties of P, relations. 0 


A.4 GODEL'S FIRST INCOMPLETENESS THEOREM 


We prove here a semantic version of Gédel’s first incompleteness theorem that relies 
on computability techniques and the semantic notion of correctness of an axiomatic 
system. In this form the theorem states that any “reasonable” axiomatic system 
that attempts to have as theorems precisely all the “true” (first-order) formulae of 
arithmetic will fail: There will be infinitely many true formulae that are not theorems. 
The qualifier reasonable could well be replaced by practical: One must be able to 
tell, algorithmically, whether a formula is an axiom—how else can one check a proof, 
let alone write one? “True” means true in the standard interpretation N = (N, M) 
(given below). 

To set the stage more precisely, we will need some definitions and notation. In 
order to do arithmetic, we first need a first-order (logical) language that we use 
to write down formulae and proofs. The alphabet of arithmetic has as nonlogical 
symbols the following: 

0,5,+,x,< 


264 = GODEL'S THEOREMS AND COMPUTABILITY 


These nonlogical symbols we can, of course, interpret in any way we please. However, 
the standard interpretation is given by the table below: 


Abstract (language) symbol 


0 (zero) 
Av.x +1 
Ary.c+y 
Ary.z xy 
Ary.z<y 


The alphabet has only one constant symbol; however, an arbitrary n € N can be 
captured formally in the language by the string 


SS:--S0 
SS — 


n times 


which we will denote by 7 in order to distinguish it from the informal (metamath- 
ematical) name n. Thus, any axiomatic system (or theory) that we use to formally 
prove theorems of arithmetic will contain: 


(1) The first-order alphabet described above”® 


(2) The language, that is, the sets of well-formed formulae (WFF) and of terms, 
Term 


(3) A distinguished recursive subset of WFF: the special (or nonlogical) axioms for 
arithmetic?’ 


(4) Another distinguished subset of WFF: the logical axioms (4.2.1) 


(5) The rule of inference: modus ponens 


A.4.1 Remark. Several observations will be helpful: 


(i) We required that any axiomatic system we devise for arithmetic is “practical” 
(or “reasonable”’). As we noted above, we understand this “reasonableness” as 
a promise to have a decider for axioms. This is why in (3) above we ask for a 
recursive set of (special) axioms. 


This immediately begs the question, “But is not a ‘decider’ a URM that expects 
numerical inputs, as opposed to string inputs?” 


26Of course, this language has the standard logical part that any first-order language has (cf. 4.1.2). 
27A particularly famous choice of axioms is due to Peano—the so-called Peano arithmetic (PA). It has 
axioms that give the behavior of every nonlogical symbol, plus the induction axiom schema: 


A[x := 0] A (Wx)(A > A[x := Sx]) > (Wx)A 


This schema gives one axiom for each choice of the formula A. 
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There is no difference in principle, since a number (e.g., if written in standard 
decimal notation) is a string over {0, 1, 2,3, 4, 5, 6, 7,8, 9} and, conversely, any 
string over {0, 1, 2,3, 4, 5,6, 7,8, 9} naturally represents a number. 


(ii) Applying this observation more generally to strings over any finite alphabet, 
not just over {0, 1, 2,3, 4,5, 6,7, 8,9}, we will take a more careful approach 
that disallows “0” as a digit, because its presence has undesirable side-effects. 


For this reason we act as follows: Given an alphabet of b > 1 symbols, we first 
fix an order of its members and assign to every symbol, as its value, its position 
number in the order, that is, 1,2,3,...,6. Then any string aga,a2---a, of 
symbols a; over this alphabet can be thought of as a number expressed in so- 
called “b-adic” notation (6 being the “base” of the notation), where the symbols 
a; are b-adic digits: 


Q + ayb! + agb? +--+ + anb” 


Conversely, any positive integer can be expressed in a unique way in b-adic 
notation (i.e., unique b-adic digits a; can be found) as above (cf. [39, 1, 47, 49] 
and Remark A.4.2 below). 


In our present context, let us fix an order for the (finite version of the) alphabet 
of arithmetic displayed as (1) below. This alphabet has 35 members (separated 
by commas) that we view as 35-adic digits of value, each, equal to its position 
in (1). For example, | is the value of digit “x”, 11 of digit “/”, 12 of digit “0”, 
35 of digit “<”:78 


Z,Y,2,U, VU, W, =, p, q,7,,0, 1, 2,3, 4, 5, 6,7, 8,9, 
T,1,6,),4A4,V,7,=,V,5,4+,x,< (1) 


Boolean variables are generated as in (4), p. 94, while object variables as in the 
footnote 87, p. 115. Thus any string over the alphabet (1) denotes a number in 
“b-adic” notation,” where b = 35. For example, the formulae (Vz’)a’ = zr’ 
and 0 < 1 have numerical values 


1141-35! 4-7-3524 11-3534 1-354 +25-35°+ 11-3594 1-357 +31 -358 424-359 


and 
13+ 35-35! + 12-35? 


respectively, that is, in standard decimal notation, 1961469340480871 and 
15938 respectively. 


28 Programmers are aware of “hexadecimal” notation, that is, notation base-16, where the digits that are 
allowed are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c,d, e, f. Rather than using as digits “10”, “15” etc., one uses 
“a” and “f” respectively to avoid the ambiguities of string notation that does not use separators between 
the symbols of a string. 

29Sometimes we say a “b-adic number”, just as we say “binary number”, “hexadecimal number”. 
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Pause. So what are the issues with the digit “0”? Why not number the sym- 
bols in (1) by 0 through 34 and work in ordinary base 35, named usually b-ary,>° 
instead? Because we will have trouble with strings like x1 < 0. The “digit” in the 
most significant position is (of value) 0 and we lose information as we pass to the 
string’s numerical value. That is, both x1 < 0 and 1 < 0 denote the same number. 
Correspondingly, associativity of concatenation will fail. Concatenating the digits 
“y” and “x” and “y” of the alphabet, first as (yx)y and then as y(xy), yields different 
(numerical) results, the first b?--+ 1, the second b + 1. 


At the end of this discussion we see that to speak of a set of strings over the 
alphabet (1) is the same as talking about a subset of N, while terminology such as 
“the set WFF is decidable (or primitive recursive, or semi-computable, as the case 
may be)” is now technically meaningful. 


(iii) Now that speaking about “recursive sets of strings” makes sense, we note 
further that the sets WFF, Term, and A, (logical axioms, cf. 4.2.1) are all 
recursive. Indeed, at the intuitive level we see that we can parse algorithmically, 
that is, we can write a URM that will do it for us, a string t to decide the question 
“Is t € Term?” Rather than specifying what such a program might look like, 
I will rather outline how the characteristic function of the set Term, \Tem, will 
be defined by course-of-values recursion from functions and relations known 
to be primitive recursive (in which case we just invoke A.3.65). 


To this end we just follow the definition (4.1.7) and define as follows, (a)—(d), 
being careful to use boldface type for variable names, t,x, y,z etc., so that 
there will be no clash with the 35-adic digits x,y, z, u,v, w—which are just 
names for the numbers 1, 2,3, 4,5, 6—of the alphabet (1) (p. 265): 


(a) If t is a number whose 35-adic representation is a variable or a constant, 
then we let xtem(t) = 0. 


(b) If t is a number whose 35-adic representation is Sx, for some string x 
(over alphabet (1)), then we let ytem(t) = 0 precisely when yTem(x) = 
0. 


(c) If t is a number whose 35-adic representation is +z, for some string z, 
then we let Xtem(t) = 0 precisely when 


(4x, ¥) <t (Xterm(X) = 0A xTerm(y) = OA 
t (expressed in 35-adic notation) is the string + xy) 


(d) If t is a number whose 35-adic representation is xz, for some string z, 
then we let xterm (t) = 0 precisely when 


(4x, y) <t (Xterm(x) = 0A Xtem(y) = 0A 


30“b ary” signifies base-b and digit range 0 to b — 1. Cf. the term binary. “‘b-adic”, signifies base-b but 
digit range 1 to b. Using digits 1 and 2, base-2, we have , dyadic notation, not binary. The term “b-adic” is 
due to Smullyan [47]. Interestingly, the suffix adic is Greek for ary: Cf. dyadic vs. binary (dyo or “dvo” 
is Greek for two). 


ee 
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t (expressed in 35-adic notation) is the string x xy)?! 
(e) We set Xtem(t) = 1 in all other cases. 


Similarly, one can outline a course-of-values recursion for the characteristic function 
of WFF, thus showing the primitive recursiveness of the set (equivalently, of the 
membership relation). Finally, the logical axioms, i.e., the partial generalizations 
of formulae in the groups Ax1-Ax6, can be algorithmically recognized from their 
shape (Ax2—Ax6) or via truth tables (Ax1) once the partial generalization prefix is 
removed. 

It is also worth noting that the only rule of inference (thinking of logic 2, cf. 
Chapter 5) is algorithmically applicable. Indeed, the configuration 


A—B,AtB 


is recognized by its form. Thus the relation M P(x, y,z) on numbers—that is 
true precisely when the numbers x, y, z expressed in 35-adic notation have the forms 
A-— B, Aand B respectively, for some formulae A and B—is recursive (decidable). 

QO 


A.4,.2 Remark. (Digression) (1) One theorem of Euclid states that having fixed a 
b > 1 (from N), any 7 € N has a unique quotient and remainder, that is, unique q 
and r exist—where 0 < r < b—such that n = bq + r. 
One immediately obtains, for the same b > 1, and any n > 0 a unique represen- 
tation 
n = bx + v, where 0 < v < b, and both z and v are inN (*) 


The existence part in (+) follows from Euclid’s theorem: Given n > 0, we have q 
and r, where 0 < r < b, such thatn = bg +r. If r # 0, take m = gandv =r. 
Otherwise, since g # 0 (why?), take 7 = gq — 1 and v = b. Uniqueness is settled by 
Euclid’s old argument: If br + v = br’ + v’, forO < v < band0 < v’ < 5, then 
b is a factor of the absolute difference |v — v’|. As |v — v’| < b (why?), this forces 
v =v’. Trivially, then, 7 = 7’ as well. 

Statement (+) leads to the existence and uniqueness of b-adic representations for 
positive integers exactly in the same manner Euclid’s theorem induces the existence 
and uniqueness of b-ary representations for nonnegative integers. Proceeding by 
strong induction we note that n = 1 has a b-adic representation: “1”. Admitting 
the claim for all 1 < m < n, and using (*), we have (1-H. applied to 7) n = 
b(ap + a,b! +--+ + ayb*) + v for some appropriate b-adic digits a;. This settles 
existence. For uniqueness let 


ag + ab! +--+ +. agb* = ep + c1b! +--+ + Cmb™ (+*) 
where 0 < a;,¢; < 6 forall, 7. By (+) we have ag = co. Thus 


a, + agb' +++» + a,b"! = cg +.e9b' +--+ + emb™ | 


3'This, “x, y < t” is what makes the recursion “course-of-values”. In “+-xy” we are using the formal 
prefix notation for terms rather than the friendly (but informal) “x + y”. 
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and, as above, a; = c;. And so on (or use induction). 

(2) Now we can go back and prove Xtem € PR in detail. We will first develop 
some tools for manipulating b-adic numbers.*? As before, we will typeset all the 
variables that we employ in boldface to distinguish them from the digits x, y, z, u, v, w 
of alphabet (1) p. 265. 

We fix the base b throughout our discussion and show that Ax.|x|, the length of 
the number x expressed in b-adic notation, is in PR: 


Beene Ix} +1 ifb*I+1—x(b-1)+b 
|x| otherwise 


We next show that concatenation is a primitive recursive operation. We will denote 
by x * y the (numerical) result of concatenating the b-adic representations of x and 
y in that order. Since x + y = xb!¥l + y, we have Axy.x xy € PR. The absence 
of 0 digits makes clear that (x x y) *z = x * (y +z), thus writing “x + y + 2” is 
unambiguous. 

Note that the number 0 behaves like the empty string, in terms both of its length 
and its behavior with respect to concatenation: 0 + y = ObIY! + y = y andx*0 = 
xb +0=x. 

Let next x By be the relation “the b-adic representation of x is a prefix of that of 
y”. This relation is primitive recursive (cf. A.3.45) since x By = (Sz)<yy =x *Z. 
Similarly, for the “postfix relation” xy that means “the b-adic representation of 
x is a postfix of that of y” and the “part of” relation, xPy, meaning “the b-adic 
representation of x is a substring of that of y”: xBy = (az)<yy = z*x and 
xPy = (dz)<y(zBy AxEz). 

Let now d be any digit among the ones allowed in b-adic notation: 1, 2,...,b. The 
relation \x.tallyq(x) says that x is nonzero and all its digits (in b-adic notation) are the 
same as d. This, too, is primitive recursive: tallyg(x) = x > OA(Vz)<x(zPxA|z2| = 
1lsz=d). 


Important Notational Convention: Jn expressing the various formulae and 
functions we are consistently using the value of an alphabet symbol rather 
than its name. Thus, rather than writing “0Bx” to indicate that the alphabet 
symbol “0” begins the (b-adic representation of) x, we write instead “12Bx"; 
rather than x = x, we write x = 1, etc. 


Most of the above machinery based on b-adic concatenation was developed by 
Smullyan ((47]) and Bennett ({1]) and is retold in Tourlakis ([49]), the ideas having 
originated in Quine ([{39]). 

To conclude our task, we will show first that the relation Var(x) that is true 
precisely when the b-adic notation of x has the syntax of an object variable (cf. 
footnote 87, p. 115) is in PR. We split our task into two subtasks: 


32That is, positive integers written in b-adic notation for some b > 1. 
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(i) The relation Num(x) says that the b-adic notation of the number x is a string 
over the subalphabet {0, 1, 2,3, 4,5, 6, 7, 8, 9} of (1) that does nor begin with 0. 
We now have Num(x) = x > 0A 7128 Bx A (Vy)<x(yPx A |y| = 1 > 
y=128@vy=13Vy = 14Vy =15Vy = 16Vy= 17 Vy =18Vy= 
19Vy =20 Vy = 21). Clearly, Num(x) € PR,. 


(ii) Var(x) = (Jy, z)<x((y = 0 V tally:1(y)) A (z = OV Num(z)) A (x = 
Lley*2Vx = 24y42Vx = 3eyt2Vx = 4ey*Z2VX = Sey*ZVX = 6*y #2). 


Finally, revisiting xterm we see at once: 


XTerm(t) = 
if Var(t) Vt = 12 then 0 
else if (4x) <¢(t = 3256 * x A XTem(x) = 0) then 0 
else if (4x, y) c(t = 33 * x * Y A Yterm(X) =0A XtTem(y) = 0) then 0 
else if (4x, y)ce(t = 34*x * YA Xtem(X) = OA Xtem(y) = 0) then 0 
else 1 


oO 


A.4.3 Exercise. The reader is invited to similarly prove, in detail, that the relations 
(1) x € WEF, which holds exactly when the number x, expressed in b-adic 
notation, is a formula, and 
(2) M P(x, y,z), which holds precisely when the numbers x, y,z, expressed in 
b-adic notation, are formulae of the forms A, (A — B) and B respectively, are 
primitive recursive. Oo 


A.4.4 Exercise. Elegant as the primitive recursive definition of the b-adic length |x| 
may be, it is, in practical terms, computationally very inefficient. For one thing, 
the recursion has as many levels as the value of x. With more thought the depth 
of recursion can be reduced to |x|, which is approximately log,(x). Effect such a 
definition by first getting two primitive recursive functions quot and res such that for 
any x > Oandy > 1 wehave x = quot(x, y)y+res(x, y) withO < res(x,y) < y. 
Then observe that for any x > 0, the b-adic length of x is one longer than that of 
quot (x, b). Oo 


Let us now agree on a coding of proofs. Since a proof is a sequence of formulae, 
Ag,.-.,An, and as each formula can be naturally identified with a number, we will 
code such sequences, and hence proofs, as (Ag,..., An). 


33 Digit “0”. 

34 Digit “0”. 

35Digit “,” has value I. 

3632 is the value of digit “S”, 33 is that of “+” and 34 that of “x”. 


ee 
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A.4.5 Lemma. Given the stipulations (1)-(5) above (p. 264), the relation 
inProof (y,x) 


which holds precisely when the number y codes a proof of the formula coded by x, 
is decidable. 


Proof. (Outline) We have enough confidence by now to accept that the relation 
x € Ay, which holds exactly when the number x—expressed in b-adic notation—is a 
logical axiom, is recursive (indeed with some effort one can prove that it is primitive 
recursive). Recall that we have also stipulated that the set of special axioms of 
arithmetic is decidable. That is, S P(x) is recursive, where the relation holds exactly 
when the number x, expressed in 5-adic notation, is a special axiom. 


It is a stipulation rather than an observation because we want to allow the widest 
variety of “practical” axiomatizations of arithmetic over the alphabet (1) of p. 265. 
If we restrict attention to the particular Peano axiomatization, then one can actually 
prove (with some effort) that the set of special axioms is in fact primitive recursive 
(cf. [53]). 


Then 


inProof(y,x) = Seq(y) A (Vi) ciny)(SP((y)i) V (y)i € ArV 
(Aj, k)<i(MP((y)j, (yx, (y)i) VM P((y)x, (y)j, (y)i)) 


A.4.6 Lemma. The set of all theorems of any axiomatization of arithmetic satisfying 
the stipulations (1)—(5) above (p. 264) is semi-computable. 


Proof. Let AR be the set of all theorems of an axiomatization that is as stated above. 
By the preceding lemma, x € AR = (Jy)inProof(y,x). O 


A.4.7 Remark. Thus (as we remarked long ago, p. 7; see also footnote 8 on that 
page), for any axiomatization of arithmetic that satisfies the stipulations (1)—(5), there 
is a computer program that without any input, and if it is allowed to run forever, will 
print (in coded form, as numbers) all its theorems. All the program has to do is 
to use as a subprogram a URM that computes a primitive recursive f that has AR 
as its range (cf. A.3.91). This computer program will behave simply as follows: 
for: = 0, 1,2,3,... print f(z). O 


Let us call complete arithmetic (CA) the set of all closed formulae over our 
language of arithmetic ((1) on p. 265)—i.e., formulae with no free variables, also 
called sentences—that are true in Nt: 


CA = {closed A: Fm A} 


We make some preliminary observations before we state and prove Gédel’s first 
incompleteness theorem. 
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A.4.8 Exercise. We have noted that the “informal number n is captured in the lan- 
guage of arithmetic by the term 

SS...50 

——_—"” 


fength n 


which we abbreviate as 7”. Make this statement precise by proving, using the table 
on p. 264 and simple induction on n > 1, that (s"0)™ = n, where I used the 
abbreviation “S”” for the string SS... S of length n. i) 


Following on the above and noting that n™ = n (A.1.5(5)), we have (7) =n" 
having written n for S"0. That is (A.1.6), 


emg ian (1) 


Since (by first-order soundness and Ax6) E 7 = n — (A[7] = A[n])*’ for any 
formula A over the language of arithmetic, we obtain, in particular, Em m = n > 
(A[7] = A[n]) and hence 


En Af®] = A[n] (2) 


By iterating this a finite number of times we can prove things such as Eq 
Ali,m,k,j] = A[n,m,k,j ]. mixing at will “formal numbers” @ with imported 
constants 7, 7, k,... in the left hand side. 

Pause. This is neat. Could we then go back and argue A.1.20 as follows? 

We are given an interpretation D = (D, M),a D-formula A, and a D-term ¢ such 
that {> = i for some i € D. We want 


(Abc := ¢])° = (Afx := i)” (3) 


Since - t = i > (A[x := t] = A[x := iJ) as abe. we have in particular 
Ep t =i — (A[x := t] = A[x := i]). Taking also t? = i? into account—that is, 
(t = i)” = t—we get Kp (A[x := t] = A[x := #]). Or, in different notation, we 
have (3) above. As for (s[x := t})” = (bess ij)” we can use 7.0.8 rather than 
Ax6. Right? 


We will need one last preparatory item: 


A.4.9 Definition. (Correctness; [48]) An axiomatization of arithmetic is termed 
“correct” precisely when the standard interpretation St is a model of the set of 
the special axioms. 0 


Correctness is not the same as soundness. All first-order theories satisfy the sound- 
ness theorem (A.1.21). However, there are consistent axiomatizations of arithmetic 
that are nor correct ([48, 53]). 


37 The notation “[.. . J” was introduced in Definition A.1.14. 
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A.4.10 Theorem. (Gédel’s First Incompleteness Theorem) Any axiomatic system 
for arithmetic that satisfies (1)-(5) of p. 264 and, moreover, is correct must be 
incomplete in the sense that its set of theorems cannot contain the set CA: There 
will be true sentences of arithmetic that the system cannot prove. 


Proof. We will show that if Gédel’s theorem fails, then we can build a URM that 
solves the halting problem—a known impossibility (A.3.88). 

Let us then have a set of (special) axioms over the language of arithmetic—an 
axiomatization—that satisfies (1)—(5S) and that contains as theorems all of the set CA. 
Thus, if © is the set of theorems of the axiomatization, 


CAC®e (*) 
We note that © is semi-computable (A.4.6); thus, for some recursive Q(y, x), 


x € 8 = (Jy)Q(y, x) (+*) 


We will accept here a fact proved in the next section: The relation ¢, (a) f is definable 
in N (cf. A.1.14). More specifically, we can find a formula of arithmetic, A, of one 
free variable x such that, for alli € N, ¢:(i) fT = Em Afi]. In view of the 
remarks preceding the theorem, we can rewrite the last equivalence as 


for alli € N, di(i) T = Em Alt] (x * *) 


This is extremely useful, since A [7] is over the alphabet (1) of p. 265 and thus we 
can code each true “statement” of the form “¢;(z) T” by a natural number (whose 
35-adic notation is) A [7]. 

Here then is how we can solve the halting problem. We are given a number 7 and 
are asked to compute an answer to the question “i € K’?’—that is, “d;(7) |?” 


(a) We compute the number that the b-adic string A [7] represents; say it is b. 


(b). Using some fixed verifier for (x*) and a URM that computes the function 
Ax.d((uz)T(x,x,z)) we start simultaneously computing the answers to “b € 
6?” and “¢;(z) = ?” 


(c 


~ 


By («), if @;(z) T is true, then A [7] € 8. Conversely, if A [7] € @, then as 
correctness guarantees that the special axioms are true in 3t and as soundness 
guarantees that proofs propagate truth, we have that Em A [7], that is, ¢,;(¢) T 
is true. Thus, if the first process halts, then we stop everything and proclaim 
“4 ¢ 7 as (i.e., “o;(t) T”). 
(d) If the second process halts, then we stop everything and proclaim “z € K” (ie., 
“0; (it) 1”). 

Thus, (*) is untenable. 

The hand-waving can be (mostly) eliminated simply: Let us call rif (t) the 
function that on input ¢ computes the number whose 35-adic notation is A [7]. Once 
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we have pinned down the formula A (in the next section), we can see that finding the 
number 
A[x := SS...S0] 
SS 
length i 


tedious as it may be, is rather computationally straightforward—if you do not agree 
see also the following remark! In the absence of details we can readily accept that 
f € R (and more work can show that actually f € PR; see remark below). But 
then, setting g = ri.(ny) (Q(y, f(i)) V T(i,t,y)), we see that, first, g € R since it 
certainly is in P, and is total (why?) Second, 


i€ K =T(i,i,9(2)) oO 


Note the emphasized any in the theorem. It draws attention to the fact that we have 
not fixed any particular correct and “reasonable” theory—whatever we said holds for 
all such theories. Any, moreover, implies that every theory that the theorem speaks 
of misses an infinite chunk of CA. Indeed, if it misses only a finite chunk, S, then 
adding all the formulae of S as special axioms we still get a recursive set of special 
axioms (closure properties, and the fact that finite sets are recursive) and one that is 
still correct (why?). 

But the set of theorems of this new axiom set covers all of C'A (why?) contrary to 
Gédel’s theorem. 


A.4.11 Remark. (Tarski’s Trick) We know (“one point rule”, 3, p. 175) that F 
(4x)(x = iA A) = A[i] and hence — (4x)(x =i A A) = A[i]. 

Thus, in the proof above, we might as well input to our verifier the number 
represented in 35-adic by “(4x)(x = 7A A)”, rather than inputting the one represented 
by “A {i}”. But why would we want to deal with a more complex counterpart of 
A [7]? Because it is easier to compute the number represented by the more complex 
formula as long as we have computed once and for all the constant that has as 35-adic 
notation the formula A! The complex formula isolates 2 in just one place, to the left 
of A. The “simpler” formula A [7], on the other hand, may have 7 occur in several 
places, making it very hard to reuse (the number) A. 

Here is the calculation in detail: 

First, we want a primitive recursive function “tzlde” such that, for all n > 0, 
tilde(n) is the number whose 35-adic representation is S"0. We understand S°0 as 
a verbose way to say “0”. Since the number so represented by S”0 is 12 + 33(35 + 
35? + --- + 35"), we have that 

n 
tilde(n) = 12+ | 


Let next f be as in the above proof, except that it refers to (4x) (x = 7A A) rather 
than to A [i]. To fix notation, we take the formal variable x (which is unspecified 
in the notation “x’) to be x (lightface!) of the alphabet (1). Let a be the (constant!) 
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number represented (always in 35-adic) by the string “A A)” and c be the number 
represented by the string “(4x)(z = ”. Then, for all i € N, 


f(i) = cx tilde(i) *a 


See also the definition of concatenation, “+”, on p. 268. Thus f is primitive recursive 
as promised. Oo 


A.4.12 Exercise. Prove that every finite set is primitive recursive. oO 


A.4.13 Corollary. The set CA is not semi-recursive. 


Proof. Assume that C’'A is semi-recursive; thus for some recursive relation Q3° we 
have x € CA = (dy)Q(y,x). We use a trivial variation on the proof of A.4.10 
and still solve the halting problem! This time we change (b) to say: “Using some 
fixed verifier for x € CA and a URM for the function Ax.d((jz)T'(x,x,z)) we 
start simultaneously computing b € C'A and ¢,(i).” The subprocess (c) now simply 
reads, “Thus, if the verifier halts, then we stop everything and proclaim that i ¢ K”. 
All else is the same. O 


A.4.14 Corollary. The set CA is not recursive. 


Proof. By A.3.85. O 


A.4.14 is a weak form of Church’s theorem. The latter actually says that the set of 
all theorems of any axiomatization of arithmetic as described in Gédel’s theorem is 
not recursive ([3, 4]). Nevertheless, the corollary is useful in pointing out that the 
“trivial solution” toward obtaining as theorems of arithmetic all of C'A is wrong: 
This solution would be to take all of C’A as the set of special axioms, but then the set 
of special axioms would cease being practical (recursive). 


A.4.15 Remark. The original version of Gédel’s first incompleteness theorem ([16]) 
was stated exclusively in syntactic terms: In any recursive and w-consistent extension 
of Peano arithmetic there will be undecidable sentences in the language of arithmetic, 
that is, closed formulae A such that neither A, nor —A are provable. 

w-consistency of an axiomatic number theory is the property that for no formula 
A of one free variable x is it possible to have all of —A [7% ]—for n € N—and (4x) A 
provable. This condition implies, but is not implied by, consistency. Rosser ([42]) 
strengthened the incompleteness theorem to read: In any recursive and consistent 
extension of Peano arithmetic there will be undecidable sentences. 

The reader will note that in our semantic version correctness took the role of 
consistency (indeed, every axiom set that has a model—here St—is necessarily 
consistent; cf. A.1.22). Any true formula A [7] (A as in the proof of A.4.10) that 


3°This is a different “Q” than the one in the proof of Gédel’s theorem, of course, but I use the same letter 
here so that the changes needed to adapt said proof to the one for the corollary are minimal. 
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is unprovable—and the essence of the proof is that not all such true sentences are 
provable—is undecidable in the sense of Godel: Indeed, its negation is false, thus it 
cannot be provable either by correctness and soundness. 

Gédel’s original proof relied on a syntactic version of the liar’s paradox, the latter 
being the paradoxical (semantic) utterance “I am lying”.*° Gédel formulated within 
Peano arithmetic the self-referential statement, “I am not a theorem”, and showed it 
to be an undecidable sentence. 

The phenomenon of self-reference is excellently explored in Hofstadter’s Gédel, 
Escher, Bach ((23]). A related gripping (fictional) account of a brilliant reclusive 
mathematician’s efforts to settle “Goldbach’s Conjecture’, and how he was shocked to 
learn of Gédel’s incompleteness theorems, is Doxiadis’s Uncle Petros and Goldbach’s 
Conjecture ((12]). O 


A.4.16 Exercise. If a set of special axioms I over the language of arithmetic is w- 
consistent, then it is also consistent. oO 


A.4.1 Supplement: @,(x) T Is First-Order Definable in St 


We will continue using boldface for variables in the metatheory to avoid confusion 
with the formal z, y, etc., listed in (1) on p. 265. Let us define inductively the set of 
arithmetical relations of the metatheory.*° 


A.4.17 Definition. The set of arithmetical relations is the smallest set of relations 
that: 


(A) Contains the “initial” relations z = x + y, Z = x-y, and z = ez(x,y) (cf. 
A.3.34)—where x, y, z are distinct variables in the metatheory 


and, moreover, 
(B) If Q(X) and P(y) are in the set, then so are -Q(x) and Q(x) V P(y). 
(C) If R(y, X) is in the set, then so is (Vy) R(y, x). 


(D) If Q(X) is in the set, then so are all its explicit transformations. 


Explicit transformations ([47, 1]) are exactly the following: substitution of 
any constant into a variable, expansion of the variables-list by “don’t care” 
variables (arguments), permutation of variables, identification of variables— 
that is, Grzegorczyk operations (ii)—(iv) (cf. A.3.26), albeit applied to relations. 

Oo 


39Epimenides of Crete actually proclaimed a slightly different statement: “All Cretans are liars,” 

40The arithmetical relations have a lot of tolerance for variations in their definition: Sometimes as much 
as all of Rs is taken as the “initial” arithmetical relations. Sometimes as little as z = x + y and 
zZ = x-y. For technical convenience we have added the graph of exponentiation rather than choosing the 
most minimalist approach. 
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Clearly the set of arithmetical relations is closed under the remaining Boolean 
connectives and (Ay). 


A.4.18 Lemma. Every arithmetical relation is first-order definable in N over a 
language, L, of arithmetic that contains beyond the functions S,+, and x a function 
symbol for exponentiation. 


Proof. To keep the alphabet fixed to that of (1) on p. 265 the exponentiation function 
symbol (of arity 2) will be taken to be “(x)” whose natural semantics in % are: 
(x)™ = dxy.eax(x, y). We remind ourselves of the table on p. 264 that states the 
semantics of the remaining nonlogical symbols, and proceed by induction along the 
cases of Definition A.4.17. The basis contains three cases, z = x + yandz=x-y 
and z = ex(x,y). 

Thus, writing the term “+2y” in the friendlier infix notation, we have that z = 
x+y"! defines (cf. A.1.14) 2 = x+y, since for any a, b,cinN, 


(a=b+c%)"=¢t iff (a =b%4+%c™%) =t 
cf. A.1.6 
iff a=b+e 


An entirely analogous case can be made for z= x-y and z = ex(x,y). For 
example (using the formal exponentiation “(a)” in infix notation), 


(a = b(x)c)™ = tiff (a™ = (x) (b",c%)) =t 
iffa = ex(b, c) 


We leave it to the reader to verify that if R(x) and Q(y) are defined by the formulae 
A and B respectively, then -R(X) and R(X) V Q(y) are defined by =A and AV B 
respectively. 

Next, we show that (Vy) R(y, X,) is defined by (Vy) A, if R(y,x,) is defined by 
A. So, let without loss of generality y,21,..., 2, be all the free variables of A. We 
have for any c,b;,...,6, in N that 


R(c,by,...,b,) iff (A['c,b1,...,6-])" =t (1) 


Now, we fix b;,...,0, inN. (Vy)R(y,b1,...,6,) holds iff for all c € N we have 
that R(c,b,...,b,) holds. By (1) this is equivalent to “for all c € N we have 
(Alfe,bi,-..6- J)” =”. 

By A.1.6 the latter says precisely ((Vr)A [b1,..., 5, ] Ng = t. 

We conclude by looking into explicit transformations. Let then Q(y,xX,) be 
defined by the formula (over L) A. Then, for any fixed i € N, Q(i,x,) is 
clearly defined by A[ji], since for all a,b;,...,6, we have Q(a,b;,...,6,) iff 


411 used the specific formal variables x, y,z here. Any other set of distinct formal variables will also 
work. For example, 2191 = £222 + £303 also definesz = x+y. 
421n brackets we have a formal formula over the language L(N): (z = x + y)[z := al[x := 6][y := cl]. 
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Em Ala, h,...,5,] and thus Q(i, b1,... ,b,) iff Em Afi,b,...,6,]. While 
A[1] is not over the language of arithmetic, we may use instead A [2]. which is 
(cf. remarks following A.4.8). The case of identifying or permuting variables being 
trivial, we conclude by looking at the case of adding one “don’t care” variable (ex- 
tensible to any fixed number by a trivial induction). So let A, over L, define Q(X, ) 
and let z be a new metamathematical variable. | take z, a variable different from all 
the free variables of A (which, without loss of generality, are x1,..., x,) and argue 
that A A z = z defines the relation R = zxX,.Q(xX;,): 

We have, on one hand, for all b1,...,b;: Q(b1,...,0,) iff (A [b1,...,0-])” 
= t. On the other hand, for all c, b),...,6,, Q(b1,.-.,6-) = R(c,bi,...,6,) and 


(A[bi,...,0-])" = 3(A[br,...,b-]) A (c = c)™ hold. O 


To show that ¢,(x) 7 is first-order definable in % over L it suffices, because of 
the lemma, to prove that it is arithmetical. In turn, since ¢,(x) T= 7(dy)T(x, x,y), 
it suffices to prove that the Kleene predicate is arithmetical. It will so follow if we 
can prove that for every function f € PR its graph is arithmetical, for then if 7 is 
the characteristic function of 7’, we will have that y7(x, y,z) = w—and therefore 
x7(x, y,z) = 0 by explicit transformation—is arithmetical.‘ 


A.4.19 Lemma. The following relations are arithmetical. 
) x = 0 (and hence x # 0) 
x < y (and hence x < y) 


Next(x, y) (meaning x < y are consecutive primes) 
pow(z, x,y) (meaning x > 1 and ex(x,y) is the highest power of x dividing z) 
Q(z) (meaning z has the form pop?p3 ---p®*? for some n) 


) 
) 
) 
) 
Seq(z) 
) 
) 
0 


Note. We need not worry about bounding our quantifications, for it is not our purpose 
to show these relations in PR,. Indeed we know from earlier work that they are in 
this set. This time we simply want to show that they are arithmetical. 


Proof. 

(1) x = 0 (and hence x ¥ 0): x = Ois an explicit transform of x = y + z; x 4 Ois 
obtained by negation. 

(2) x < y (and hence x < y): This is equivalent to (4z)(x + z = y). 

(3) z= x + y: This is equivalenttoz =OAx<yVx=2z+y. 


“Equality on the set {t, f}. 

44 Gédel proved all this without the need to have exponentiation. However, adopting this operation makes 
things considerably easier and, as mentioned earlier (footnote 40 on p. 275), it does not change the set of 
arithmetical relations. 
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+ 


(4) x|y: This is equivalent to (4z)y = xz (I am using “implied multiplication’ 
throughout: “xy” rather than “x x y” or “x -y”). 

(5) Pr(x): This is equivalent tox > 1 A (Vy)(y|x ~>y=1Vy =x). 

(6) Seq(z): This is equivalent to z > 1 A (Vx)(Vy)(Pr(x) A Priy)Ax <yA 
y|Z— x|2z). 

(7) Next(x,y): This is equivalent to Pr(x) A Pr(y) Ax < y A 7(4z)(Pr(z) A 
x 


(8) pow(z,x,y): This is equivalent to x > 1 A ex(x,y)|z A ex(x, y + 1)|z45 
(9) Q(z): This is equivalent to Seq(z) A =4|z A (Vx)(Vy)(Nezt(x,y) Ay|z > 
(Aw) (pow(z, x, w) A pow(z,y, w + 1))). 
(1 
(1 


0) y = pn: This is equivalent to (Az)(Q(z) A pow(z, y,n+ 1)). 

1) z = exp(x, y): This is equivalent to (dw) (pow(y, w,z) A w = px). O 

We can now prove the following theorem that concludes the business of this 
sub-section. 


A.4.20 Theorem. For every f € PR, its graph y = f (Xp) is arithmetical. 
Proof. We do induction on PR (cf. A.3.24): 


(1) Basis. | There are three graphs to work with here: y = x + 1, y = 0 and 
y = x (or, fancily, y = x;; or more fancily, y = U?(X,)). They all are explicit 
transforms of y = x + z. 


(2) Composition. Say, the property is true forthe graphsof f,91,...,9n. Thisis the 
induction hypothesis (I.H.). How about y = f(91(Xm), 92(Xm),---19n(%m))? 
Well, this graph is equivalent to (by repeated application of the informal 3 on 
p. 175) 


(du;)-+-(Jun)(y = f (tin) Aur = 91(%m) A+++ A Un = 9n(Xm)) 
and we are done by the I.H. 


(3) Primitive recursion. This is the part that benefits from the-work put into A.4.19. 
Here’s why: Assume (I.H.) that the graphs of h and g are arithmetical, and let f 
be given for all x, y by 


f(0,¥) = A(y) 
f(x+1,¥) = 9(x ¥, f(x, ¥)) 
Now, to state z = f(x, y) is equivalent to stating 
(Amip)(3m:) + (Sx) (mp = h(¥) Az = my A 


(i) 
(Vw) (w <x> my =9(W,y, mw)) ) 


45Note that ea: is that of A.3.34, and ex(x, y + 1)|z = (Ju)(u = y + 1 A ex(x, u) | z), applying 3 on 
p. 175 within the metatheory. 
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The trouble with the “relation” (i) above is that it is not a relation at all,*° because 
it has a variable-length prefix: (4mp)(4m,) ---(3m,.). We invoke coding to 
salvage the argument. Let us use a single number,*” 


m = po py + Py 
to represent all the m;, fori = 0,...,x. Clearly, 
m; = exp(i,m), fori = 0,...,x 
We can now rewrite (i) as 


(3am) ( exp(0,m) = h(¥) Az = exp(x, m) A 
(ii) 
(Vw) (w <x — exp(w + 1,m) = 9(w,y, exp(w.m))) 


The above is arithmetical because of the J.H. Some parts of it are more compli- 
cated than others. For example, the part 


exp(w + 1,m) = g(w, ¥, exp(w, m)) 
is equivalent to 
(du) (Jv) (u = exp(w + 1,m) A v = exp(w, m) A u = 9(w, ¥,v)) 


The above is arithmetical by the I.H. and the preceding lemma. This completes 
the proof. QO 


46 Relation is the informal counterpart of formula, of course. : 
47We do not need the fancier coding (xno, ..., mx) here. since the length, x, of the sequence is known. 
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of propositional calculus, 98 
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computability, xiii, xvi, 232 
computable function, xvi 
computably enumerable sets, 262 
computation, xvi, 234 

arithmetization of. 254 

terminating, 256 
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conjunctional. 92, 171 
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use of <, 64 

use of =, 64 
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Boolean, 118 
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consistency theorem, 231 
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correct 
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recursive part of, 13 
“definition of 3”, 173 
derivation, 258 
deterministically, 32 
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domain, 196 
double recursion, 249 
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Euclid, 252, 267 
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expansion, 132 
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of a state, 29 
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f, 26 
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falsum, 9 
1,9 
Feferman, 225 
Fibonacci, 95 
sequence, 253 
finite state, 31 
first incompleteness theorem, 162, 272 
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first-order semantics, 195 
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i.p., 15 
LS., 18 


ID, 254 
initial, 256 
idempotency of V, 43 
if-statement, 234 
iff, 12 
immediate predecessor, 15 
implication 
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index 
semi-computable, 259 
induction, xiv, 17 
complete, 17 
course-of-values, 17 
on formula complexity, 22 
on formulae, xv, 19 
definition by, 22 
on numbers, 17 
on PR, 244 
simple, 18 
strong, 17, 18 
induction hypothesis, 18 
induction step, 18 
infinitary, 29 
infinite loop, 242 
infix notation, 121 
initial function, 243 
initial truths, see axioms 41 
instantaneous description, 254 
intentionally, 122 
interpretation, 195, 212 
of a first-order language, 196 
domain of, 196 


typical notation for: D = (D,M), 


196 
underlying set of, 196 


of a first-order language formula 
typical notation for: A, 196, 212 

of a first-order language symbol 
typical notation for: symbol? , 196 


of a formula, 196 
standard, 263 
intractable problem, 7 
iterator, 253 
iterator function, 240 


Kalmar, 232 
Kleene, xvi 
T-predicate, 254 
Kleene normal form theorem, 257 
Kleene predicate, xvi, 257 
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Kleene star, 94 
Kleene T-predicate, 257 
Knuth, 3 
Kuratowski’s, 122 
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in a URM, 234 
X notation, 236 
language, 5, 119 
first-order, 116, 121 
formal, 24 
Leibniz, 40, 77 
single formula, 100 
Levy, 225 
lexicographically, 95 
literal, 104 
Lobachevsky, 34 
logic, 3 
about, 4 
application of, 116 
applied first-order, 116 
Aristotelian, 26 
Boolean, 6 
calculational, 38 
classical, 26 
equational, 38 
first order, 6 
mathematical, 3 
predicate, 6 
propositional, 6 
aim of, 25 
sentential, 6 
sound, 89 
working within, 4 
logical deductions, 3 
logical implication, 148, 215 
logical symbol, 116 
logics 
equivalent, 57 
logics (plural), 57 


machine, 233 

mathematical object, 114 
maximal consistent theory, 227 
meaning, 3 

meaningless symbols, 27 
mechanical procedure, 232 
mechanical process, 7 

member of notation: A € &, 45 
metalogic, 4 

metamathematics, xii 
metaproof. 82 

metaprove, 45, 58 
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metatheorem, xiv, 7, 19, 45 Peano axioms, 170 
use of the term, 51 Péter, 232 
vs. theorem, 7 g-index, 258 
metatheorem schema, 46 Ping-Pong argument, 102, 158, 166 
metatheoretical, xiv PL/1, 65 
metatheory, xii, 4, 30 plus A notation, 81 
of propositional logic, 26 [+At B, 8) 
metavariable, 9, 30, 39 Post’s theorem, xv, 89, 93, 140, 152, 171, 
Boolean, 117 178, 211, 224 
for terms, 120 PR-derivation, 243 
object, 117 PR-function, 243 
model, 215, 223, 231 precedence, 15 
model of computation, 236 predecessor function, 238 
models of computation, xiii predicate, 115, 247 
modus ponens, 147 symbol, 118 
monotonicity, 160 symbol for, 116 
: prefix, 20 
N-Henkin, 227 premises 
n-tuple, 216 


of a tautological implication, 34 
prenex operations, 170 
prime formula, 130 
primitive recursion, 240 
closed under, 241 
priority, 15 
of substitution, 36 
revisited, 125 
problem, 261 
recursively unsolvable, 261 
undecidable, 261 
unsolvable, 261 
Program verification, xiv 
eecherence programming language, 4 
of a variable, 127 Programming Language One, 65 
is bound, 127 Projection theorem, 259 
proof, 3, 4 
Hilbert, 6 
by auxiliary constant, 190 


symbol for: Zn, 216 
natural deduction, 79 
nondeterministic 

certification of tautologies, 37 
nondeterministically, 32 
nonlogical symbol, 116 

examples of, 116 
normal form theorem, 257 
normal form theorems, xvi 


object, 114 
variable, 115 
object constant, 118 


occurrence, see variable 31 
w-consistency, 274 
one point rule, 174, 175 


Oracle at Delphi, 124 by contradiction, 86 
ordered sequence, 10 by induction on (formal) proof length, 
84, 91 

P vs. NP, 8 by induction on formulae, 82 
P-derivation, 258 by intimidation, 136 
P-formula, 149 by simple induction, 18 
P-formula-calculation, 149 by strong induction, 18 
parse, 37 calculational, xiii 
parsing circular, 58 

bottom-up, 14 definition of, 43 

top-down, 14 equational, xiii, 6 
partial generalization, 138, 143 extended, 171 
partial translation of a formula, 198 from T, 43 
Pascal, 128, 233 Hilbert-style, xiii 
Pasch, 28 in equational style, 60 
Peano, 114 layout, 63 


Peano arithmetic, xii in first-order logic, 145 


in Hilbert style, 45, 146 
proper subtraction, 238 
symbol for: x — y, 238 
property, 17 
propositions, 25 
prove, xi 


quantifier, xiv, 114 
alternative notations, 124 
bounded, 175 
existential, xvi, 123 

symbol for: 3, 123 
intuitive meaning, 124 
pronunciations, 124 
relativized, 175 
universal, xvi, 118, 121 

symbol: V, 118 

symbol! for: V, 115 

Quine, 268 


re., 263 
recurrence, 253 
recursion 
on formulae 
definition by, 22 
recursively enumerable set, 263 
redundant true, 59, 60, 146 
refute, 5 
regular set, 94 
relation 
arithmetical, xvi, 275 
computable, xvi 
computably enumerable, xvi 
definable, 272 
first-order definable, 216 
primitive recursive, 247 
set of: PR», 247 
recursive, 247 
set of: R«, 247 
semi-computable, xvi, 259 
set of: Px, 259 
resolution, xv, 79, 104 
Riemann, 34 
right associative, 16 
Rogers, 258 
Rogers’s @-notation, 258 
Rosser, 274 
rule, 39 
V introduction, 179 
annotation of in proofs, 48 
BL, 144 
Boolean Leibniz, 144 
cut, 79 
derived, 99 
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applicability of, 53 
transitivity, 47 
dual of specialization, 179 
J introduction, 179 
Eqn/Leib, 57 
in Boolean logic 
Equanimity (Eqn), 40 
Leibniz (Leib), 40 
primary, 40 
in predicate logic 
primary, 144 
Leib/Eqn, 57 
Leibniz, xvi 
modus ponens, 79 
MP, 79 
of inference, 39, 117 
conclusion of, 40 
denominator of, 40 
derived, 39, 41 
instance of, 40 
numerator of, 40 
premise(s) of, 40 
primitive, 39 
secondary, 41 
substitution, 42 
of reasoning, 39 
primary 
of predicate logic, 143 
proof by cases, 81 
propositional, 143 
SL, 166 
specialization, 157 
strong Leibniz, 166 
substitution, 101 
transitivity of —, 80 
weak Leibniz, 165 
WL, 165 
tules of inference, xv 
Russell, xi 
Russell’s paradox, 185 


satisfiable 


set of first-order formulae, 215 


schema, 39 
instance of, 39 
schemas, 39 
schemata, 39 
Schneider, xi 
scope, 121, 127 
semantic implication, 215 
semantic methods, xiii 
semantics, xv, 4, 195 
first-order, 195 
Tarski-like, xvi 
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semi-computable, 259 
semi-index, 259 
sentence, 270 
set 
c.e., 262 
computably enumerable, 262 
re., 263 
recursively enumerable, 263 
set difference, 229 
symbol for: N — D, 229 
SFL, 100, 191 
Shannon, 103 
Shepherdson, xvi, 232 
simultaneous substitution, 162, 163 
single-formula Leibniz, 191 
Smullyan, 268 
software engineering, xiv 
requirements, xiv 
soundness, xiii, xiv, 90, 91, 221 
for Boolean logic, 89 
in first-order logic, 204, 221 
spec, 158 
specialization, 140 
specialization rule, 157 
standard models, xii 
state, 26 
finite (appropriate), 31 
statement, 4 
string, 10 
empty, It, 20 
prefix of, 20 
proper prefix of, 20 
variable, 10 
name of, 10 
strong generalization, 147 
Strong Leibniz with conditional substitution, 
167 
strong projection theorem, 259 
structured programming, 61 
Sturgis, xvi, 232 
subformula, 15, 22, 33 
definition of, 128 
prime, 130 
subroutine, 37 
subset symbol: TC A, 51 
substituting equivalents for equivalents, 166 
substitution, 36, 131 
cascading, 36 
conditional 
of formula into a Boolean variable, 
134 
symbol for: [p := A], 134 
left associative, 36 
of terms into variables, 132 


symbol for: [x := t], 132 
simultaneous, 162, 163 
A[p := B], 36 
unconditional 
of formula into a Boolean variable, 
134 
symbol for: [p \ A], 134 
substitution theorem, 164 
substring, 10 
symbol 
for equality, 118 
logical, 116, 117 
nonlogical, 116 
symmetry of =, 42 
symmetry of V, 43 
syntactic constructs, 4 
syntactic variable, 120 


t, 26 
t, 26 
T-predicate, 254 
Tarski, xvi, 195 
Tarski semantics, xvi, 195, 212-214 
Tarski’s trick, 273 
Frau A, 32 
tautological implication 
in first-order logic, 131 
tautologically equivalent, SO 
tautologically implies, 34 
T Ftaut A, 34 
tautology, 7, 32 
in first-order logic, 131 
term, 119, 120 
complexity of, 120 
recursive definition of, 120 
vs. object, 119 
term-calculation, 119 
term-parse, 119 
terminating computation, 256 
theorem, xi, 5 
absolute, 44, 145 
definition of, 44 
first-order, 145 
absolute, 145 
inductive definition of, 146 
logical, 145 
from Gamma, 44 
inductive definition of. 45 
logical, 44, 145 
quoting in a proof, 54 
A is: + A, 44 
use of the term, 51 
theorem verifier, 7 
theorem-calculation, 43 


from T, 43 

in first-order logic, 145 
theory, xii, 116, 195, 223 

consistent, 224, 23] 

deductively closed, 227 

maximal consistent, 227 

N-Henkin, 227 

of computation, xvi 

of models, xii 

tich, xii 


that distinguishes constants, 228 


T™, 233 
top, 9 
T,9 
Tourlakis, 268 
transitivity 
of F, 52 
translator mapping, 196 
true, xi, 26 
redundant, 59 
truth 
absolute, 37, 212 
in an interpretation, 201, 215 
relative, 37 
truth table, 7, 29 
truth value, 30 
truth values, 26 
tuple, 216 
Turing machine, 233 


unbounded register machine, 233 
unbounded register machines, xvi 
unbounded search, 241 
uncomputability, xiii 
undecidable sentence, 274 
underlying set, 196 
unfeasible problem, 7 
union, 36 
infinite: ,59 An. 96 
U, 36 
unique readability, 21 
of first-order formulae, 150 
universal principles, 140 
universal program, 254 
UNIX, 94 
unprovability, xiii 
unsatisfiable set, 34 
URM, xvi, 233 
instruction 
current, 234 
commands, 234 
computation, 234 
halting, 235 
of a function, 235 
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terminating, 235 
computations of, 254 
instructions, 234 
universal, xvi 
variable, 233 


value, 30 
variable, 9 
Boolean, 9, 94, 115, 117 
bound, 126, 215 
free, 127, 215 
fresh, 47 
infinite supply of, 47 
object, 115, 117 
symbol for, 115 
occurrence of, 31 
propositional, 9 
sentential, 9 
syntactic, 9, 39, 40 
for terms, 120 
variant theorem, 170 
verifier, xvi, 260 
of theorems, 7 
verum, 9 


weak generalization, 155 
weak Leibniz, 165 
Weak Leibniz with unconditional substitution, 
165 

well-formed-formula, 11, 12 
WEF, 12, 123 

recursive definition, 13 
wff, 11, 12, 123 
Whitehead, xi 
Wilder, xv 
winding road sign, 3 
word, 10 


zero function, 238 


This page intentionally left blank 


